Exploring Basic Search Algorithms Using Python

Introduction

As network engineers, we often deal with large datasets like IP addresses or subnets. Efficiently searching through these datasets is critical for automation tasks, such as finding a specific subnet in a network inventory. In this article, we’ll explore two fundamental search algorithms—linear search and binary search—using Python and subnets from the 10.0.0.0/8 network

Creating the Data

To simulate a network inventory, we’ll create a dataset of /24 subnets from the 10.0.0.0/8 range (e.g., 10.0.0.0, 10.0.1.0, ..., 10.255.255.0). The following Python script generates three files: an ordered list, a reversed list, and a random list of subnets.

"""Generate subnet lists for testing search algorithms"""
import ipaddress
import random

# Generate all /24 subnets from 10.0.0.0/8
subnets = list(ipaddress.ip_network("10.0.0.0/8").subnets(new_prefix=24))
networks = [str(n.network_address) for n in subnets]

# Ordered subnets
with open("ordered_ip.txt", "w") as f:
    f.write("\n".join(networks))

# Reversed subnets
with open("reverse_ip.txt", "w") as f:
    f.write("\n".join(reversed(networks)))

# Random subnets
random.shuffle(networks)
with open("random_ip.txt", "w") as f:
    f.write("\n".join(networks))

This script produces:

ordered_ip.txt: Subnets in ascending order (10.0.0.0, 10.0.1.0, ...).
reverse_ip.txt: Subnets in descending order (10.255.255.0, 10.255.254.0, ...).
random_ip.txt: Subnets in random order.

Each file contains 65,536 subnets (256 × 256), perfect for testing search performance.

Linear Search: The Straightforward Approach

Linear search is like checking each device in a rack one by one until you find the one you’re looking for. It examines every element in a list until it finds a match or reaches the end.

How It Works

Imagine searching for 10.0.5.0 in an ordered list:

Subnets: 10.0.0.0  10.0.1.0  10.0.2.0  10.0.3.0  10.0.4.0  10.0.5.0
Indices:    0         1         2         3         4         5

Check 10.0.0.0 → No match.
Check 10.0.1.0 → No match.
Continue until 10.0.5.0 is found (6 steps).

Performance:

Best case: Target is at the start (1 step, Ω(1)).
Worst case: Target is at the end or absent (n steps, O(n), where n is the list size).
For 65,536 subnets, the worst case means checking all 65,536 entries!

import sys
import time
from ipaddress import ip_address

# Read subnets and convert to integers for comparison
with open(sys.argv[1], "r") as f:
    data = [str(ip_address(line.strip())) for line in f]

# Get target subnet from user
target = str(ip_address(input("Enter a /24 subnet (e.g., 10.80.40.0): ")))

start_time = time.time()
for prefix in data:
    if prefix == target:
        print(f"Found {ip_address(target)} with linear search")
        break
print(f"Time: {time.time() - start_time:.4f} seconds")

Binary Search: The Efficient Divide-and-Conquer

Binary search is like troubleshooting a network by dividing the problem space in half repeatedly. It requires a sorted list but is significantly faster for large datasets.

How It Works

For the same list, searching for 10.0.5.0:

Subnets: 10.0.0.0  10.0.1.0  10.0.2.0  10.0.3.0  10.0.4.0  10.0.5.0
Indices:    0         1         2         3         4         5

Find the midpoint: (0 + 5) // 2 = 2 (10.0.2.0).
Compare: 10.0.2.0 < 10.0.5.0, so ignore the left half (indices 0–2).
New list: 10.0.3.0, 10.0.4.0, 10.0.5.0.
New midpoint: (3 + 5) // 2 = 4 (10.0.4.0).
Compare: 10.0.4.0 < 10.0.5.0, ignore index 4.
Check 10.0.5.0 → Found in 3 steps!

Performance:

Best case: Target is at the midpoint (1 step, Ω(1)).
Worst case: O(log n), where n is the list size. For 65,536 subnets, it takes at most ~16 steps (log₂(65,536) ≈ 16).
Binary search is exponentially faster than linear search for large datasets.

import sys
import time
from ipaddress import ip_address

def binary_search(arr, target):
    arr = sorted(arr)  # Ensure the list is sorted
    start, end = 0, len(arr) - 1
    while start <= end:
        mid = (start + end) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            start = mid + 1
        else:
            end = mid - 1
    return -1

# Read subnets and convert to integers
with open(sys.argv[1], "r", encoding="utf-8") as f:
    data = [str(ip_address(line.strip())) for line in f]

# Get target subnet
target = str(ip_address(input("Enter a /24 subnet (e.g., 10.80.40.0): ")))

start_time = time.time()
index = binary_search(data, target)
if index != -1:
    print(f"Found {ip_address(target)} with binary search")
else:
    print(f"Subnet {ip_address(target)} not found")
print(f"Time: {time.time() - start_time:.4f} seconds")

Performance Comparison

To illustrate the difference, let’s test both algorithms on our subnet datasets. Here are results for searching 10.255.255.0 (worst case for linear search in ordered_ip.txt):

Linear Search: ~0.008 seconds (65,536 checks).
Binary Search: ~0.0001 seconds (~16 checks).

For a random subnet like 10.80.40.0 in random_ip.txt:

Linear Search: ~0.004 seconds (average case).
Binary Search: ~0.0001 seconds (sorting overhead included).

The chart below visualizes the number of steps required as the dataset size grows:

Practical Application in Network Automation

Imagine you’re automating a task to find a specific subnet in a routing table or IPAM system. Linear search works for small lists but becomes impractical for thousands of subnets. Binary search, with its O(log n) efficiency, is ideal for large-scale network data, especially if the data is pre-sorted (common in databases or IPAM tools).

Example Use Case: You’re validating whether a subnet exists in a network’s inventory before assigning it to a new VLAN. Binary search ensures quick lookups, minimizing script runtime and improving automation efficiency.

Interactive Challenge

Try this hands-on exercise to see the algorithms in action:

Save the subnet generation script as subnets.py and run it to create the three files.
Combine the linear and binary search scripts into one file (search_subnets.py):

import sys
import time
from ipaddress import ip_address

def linear_search(arr, target):
    for i, prefix in enumerate(arr):
        if prefix == target:
            return i
    return -1

def binary_search(arr, target):
    arr = sorted(arr)
    start, end = 0, len(arr) - 1
    while start <= end:
        mid = (start + end) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            start = mid + 1
        else:
            end = mid - 1
    return -1

# Read subnets
with open(sys.argv[1], "r", encoding="utf-8") as f:
    data = [str(ip_address(line.strip())) for line in f]

# Get target subnet
target = str(ip_address(input("Enter a /24 subnet (e.g., 10.80.40.0): ")))

# Linear search
start_time = time.time()
index = linear_search(data, target)
print(f"Linear Search: {'Found' if index != -1 else 'Not found'} {ip_address(target)}")
print(f"Time: {time.time() - start_time:.4f} seconds")

# Binary search
start_time = time.time()
index = binary_search(data, target)
print(f"Binary Search: {'Found' if index != -1 else 'Not found'} {ip_address(target)}")
print(f"Time: {time.time() - start_time:.4f} seconds")

Run the script with different files and subnets:

python search_subnets.py ordered_ip.txt

Try searching for 10.0.0.0 (best case), 10.255.255.0 (worst case), and a random subnet like 10.80.40.0.

Challenge: Modify the script to count the number of comparisons each algorithm makes and print them. This will help you see the efficiency gap firsthand!