
Exploring Basic Search Algorithms Using Python
5 min read
Introduction
As network engineers, we often deal with large datasets like IP addresses or subnets. Efficiently searching through these datasets is critical for automation tasks, such as finding a specific subnet in a network inventory. In this article, we’ll explore two fundamental search algorithms—linear search and binary search—using Python and subnets from the 10.0.0.0/8 network
Creating the Data
To simulate a network inventory, we’ll create a dataset of /24 subnets from the 10.0.0.0/8 range (e.g., 10.0.0.0, 10.0.1.0, ..., 10.255.255.0). The following Python script generates three files: an ordered list, a reversed list, and a random list of subnets.
"""Generate subnet lists for testing search algorithms"""
import ipaddress
import random
# Generate all /24 subnets from 10.0.0.0/8
subnets = list(ipaddress.ip_network("10.0.0.0/8").subnets(new_prefix=24))
networks = [str(n.network_address) for n in subnets]
# Ordered subnets
with open("ordered_ip.txt", "w") as f:
f.write("\n".join(networks))
# Reversed subnets
with open("reverse_ip.txt", "w") as f:
f.write("\n".join(reversed(networks)))
# Random subnets
random.shuffle(networks)
with open("random_ip.txt", "w") as f:
f.write("\n".join(networks))
This script produces:
ordered_ip.txt: Subnets in ascending order (10.0.0.0, 10.0.1.0, ...).
reverse_ip.txt: Subnets in descending order (10.255.255.0, 10.255.254.0, ...).
random_ip.txt: Subnets in random order.
Each file contains 65,536 subnets (256 × 256), perfect for testing search performance.
Linear Search: The Straightforward Approach
Linear search is like checking each device in a rack one by one until you find the one you’re looking for. It examines every element in a list until it finds a match or reaches the end.
How It Works
Imagine searching for 10.0.5.0 in an ordered list:
Subnets: 10.0.0.0 10.0.1.0 10.0.2.0 10.0.3.0 10.0.4.0 10.0.5.0
Indices: 0 1 2 3 4 5
Check 10.0.0.0 → No match.
Check 10.0.1.0 → No match.
Continue until 10.0.5.0 is found (6 steps).
Performance:
Best case: Target is at the start (1 step, Ω(1)).
Worst case: Target is at the end or absent (n steps, O(n), where n is the list size).
For 65,536 subnets, the worst case means checking all 65,536 entries!
import sys
import time
from ipaddress import ip_address
# Read subnets and convert to integers for comparison
with open(sys.argv[1], "r") as f:
data = [str(ip_address(line.strip())) for line in f]
# Get target subnet from user
target = str(ip_address(input("Enter a /24 subnet (e.g., 10.80.40.0): ")))
start_time = time.time()
for prefix in data:
if prefix == target:
print(f"Found {ip_address(target)} with linear search")
break
print(f"Time: {time.time() - start_time:.4f} seconds")
Binary Search: The Efficient Divide-and-Conquer
Binary search is like troubleshooting a network by dividing the problem space in half repeatedly. It requires a sorted list but is significantly faster for large datasets.
How It Works
For the same list, searching for 10.0.5.0:
Subnets: 10.0.0.0 10.0.1.0 10.0.2.0 10.0.3.0 10.0.4.0 10.0.5.0
Indices: 0 1 2 3 4 5
Find the midpoint: (0 + 5) // 2 = 2 (10.0.2.0).
Compare: 10.0.2.0 < 10.0.5.0, so ignore the left half (indices 0–2).
New list: 10.0.3.0, 10.0.4.0, 10.0.5.0.
New midpoint: (3 + 5) // 2 = 4 (10.0.4.0).
Compare: 10.0.4.0 < 10.0.5.0, ignore index 4.
Check 10.0.5.0 → Found in 3 steps!
Performance:
Best case: Target is at the midpoint (1 step, Ω(1)).
Worst case: O(log n), where n is the list size. For 65,536 subnets, it takes at most ~16 steps (log₂(65,536) ≈ 16).
Binary search is exponentially faster than linear search for large datasets.
import sys
import time
from ipaddress import ip_address
def binary_search(arr, target):
arr = sorted(arr) # Ensure the list is sorted
start, end = 0, len(arr) - 1
while start <= end:
mid = (start + end) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
start = mid + 1
else:
end = mid - 1
return -1
# Read subnets and convert to integers
with open(sys.argv[1], "r", encoding="utf-8") as f:
data = [str(ip_address(line.strip())) for line in f]
# Get target subnet
target = str(ip_address(input("Enter a /24 subnet (e.g., 10.80.40.0): ")))
start_time = time.time()
index = binary_search(data, target)
if index != -1:
print(f"Found {ip_address(target)} with binary search")
else:
print(f"Subnet {ip_address(target)} not found")
print(f"Time: {time.time() - start_time:.4f} seconds")
Performance Comparison
To illustrate the difference, let’s test both algorithms on our subnet datasets. Here are results for searching 10.255.255.0 (worst case for linear search in ordered_ip.txt):
Linear Search: ~0.008 seconds (65,536 checks).
Binary Search: ~0.0001 seconds (~16 checks).
For a random subnet like 10.80.40.0 in random_ip.txt:
Linear Search: ~0.004 seconds (average case).
Binary Search: ~0.0001 seconds (sorting overhead included).
The chart below visualizes the number of steps required as the dataset size grows:
Practical Application in Network Automation
Imagine you’re automating a task to find a specific subnet in a routing table or IPAM system. Linear search works for small lists but becomes impractical for thousands of subnets. Binary search, with its O(log n) efficiency, is ideal for large-scale network data, especially if the data is pre-sorted (common in databases or IPAM tools).
Example Use Case: You’re validating whether a subnet exists in a network’s inventory before assigning it to a new VLAN. Binary search ensures quick lookups, minimizing script runtime and improving automation efficiency.
Interactive Challenge
Try this hands-on exercise to see the algorithms in action:
Save the subnet generation script as subnets.py and run it to create the three files.
Combine the linear and binary search scripts into one file (search_subnets.py):
import sys
import time
from ipaddress import ip_address
def linear_search(arr, target):
for i, prefix in enumerate(arr):
if prefix == target:
return i
return -1
def binary_search(arr, target):
arr = sorted(arr)
start, end = 0, len(arr) - 1
while start <= end:
mid = (start + end) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
start = mid + 1
else:
end = mid - 1
return -1
# Read subnets
with open(sys.argv[1], "r", encoding="utf-8") as f:
data = [str(ip_address(line.strip())) for line in f]
# Get target subnet
target = str(ip_address(input("Enter a /24 subnet (e.g., 10.80.40.0): ")))
# Linear search
start_time = time.time()
index = linear_search(data, target)
print(f"Linear Search: {'Found' if index != -1 else 'Not found'} {ip_address(target)}")
print(f"Time: {time.time() - start_time:.4f} seconds")
# Binary search
start_time = time.time()
index = binary_search(data, target)
print(f"Binary Search: {'Found' if index != -1 else 'Not found'} {ip_address(target)}")
print(f"Time: {time.time() - start_time:.4f} seconds")
Run the script with different files and subnets:
python search_subnets.py ordered_ip.txt
Try searching for 10.0.0.0 (best case), 10.255.255.0 (worst case), and a random subnet like 10.80.40.0.
Challenge: Modify the script to count the number of comparisons each algorithm makes and print them. This will help you see the efficiency gap firsthand!