AutoNetOps

Cover Image for Exploring Basic Search Algorithms Using Python

Exploring Basic Search Algorithms Using Python

·

5 min read

Introduction

As network engineers, we often deal with large datasets like IP addresses or subnets. Efficiently searching through these datasets is critical for automation tasks, such as finding a specific subnet in a network inventory. In this article, we’ll explore two fundamental search algorithms—linear search and binary search—using Python and subnets from the 10.0.0.0/8 network

Creating the Data

To simulate a network inventory, we’ll create a dataset of /24 subnets from the 10.0.0.0/8 range (e.g., 10.0.0.0, 10.0.1.0, ..., 10.255.255.0). The following Python script generates three files: an ordered list, a reversed list, and a random list of subnets.

"""Generate subnet lists for testing search algorithms"""
import ipaddress
import random

# Generate all /24 subnets from 10.0.0.0/8
subnets = list(ipaddress.ip_network("10.0.0.0/8").subnets(new_prefix=24))
networks = [str(n.network_address) for n in subnets]

# Ordered subnets
with open("ordered_ip.txt", "w") as f:
    f.write("\n".join(networks))

# Reversed subnets
with open("reverse_ip.txt", "w") as f:
    f.write("\n".join(reversed(networks)))

# Random subnets
random.shuffle(networks)
with open("random_ip.txt", "w") as f:
    f.write("\n".join(networks))

This script produces:

  • ordered_ip.txt: Subnets in ascending order (10.0.0.0, 10.0.1.0, ...).

  • reverse_ip.txt: Subnets in descending order (10.255.255.0, 10.255.254.0, ...).

  • random_ip.txt: Subnets in random order.

Each file contains 65,536 subnets (256 × 256), perfect for testing search performance.


Linear Search: The Straightforward Approach

Linear search is like checking each device in a rack one by one until you find the one you’re looking for. It examines every element in a list until it finds a match or reaches the end.

How It Works

Imagine searching for 10.0.5.0 in an ordered list:

Subnets: 10.0.0.0  10.0.1.0  10.0.2.0  10.0.3.0  10.0.4.0  10.0.5.0
Indices:    0         1         2         3         4         5
  • Check 10.0.0.0 → No match.

  • Check 10.0.1.0 → No match.

  • Continue until 10.0.5.0 is found (6 steps).

Performance:

  • Best case: Target is at the start (1 step, Ω(1)).

  • Worst case: Target is at the end or absent (n steps, O(n), where n is the list size).

  • For 65,536 subnets, the worst case means checking all 65,536 entries!

import sys
import time
from ipaddress import ip_address

# Read subnets and convert to integers for comparison
with open(sys.argv[1], "r") as f:
    data = [str(ip_address(line.strip())) for line in f]

# Get target subnet from user
target = str(ip_address(input("Enter a /24 subnet (e.g., 10.80.40.0): ")))

start_time = time.time()
for prefix in data:
    if prefix == target:
        print(f"Found {ip_address(target)} with linear search")
        break
print(f"Time: {time.time() - start_time:.4f} seconds")

Binary Search: The Efficient Divide-and-Conquer

Binary search is like troubleshooting a network by dividing the problem space in half repeatedly. It requires a sorted list but is significantly faster for large datasets.

How It Works

For the same list, searching for 10.0.5.0:

Subnets: 10.0.0.0  10.0.1.0  10.0.2.0  10.0.3.0  10.0.4.0  10.0.5.0
Indices:    0         1         2         3         4         5
  1. Find the midpoint: (0 + 5) // 2 = 2 (10.0.2.0).

  2. Compare: 10.0.2.0 < 10.0.5.0, so ignore the left half (indices 0–2).

  3. New list: 10.0.3.0, 10.0.4.0, 10.0.5.0.

  4. New midpoint: (3 + 5) // 2 = 4 (10.0.4.0).

  5. Compare: 10.0.4.0 < 10.0.5.0, ignore index 4.

  6. Check 10.0.5.0 → Found in 3 steps!

Performance:

  • Best case: Target is at the midpoint (1 step, Ω(1)).

  • Worst case: O(log n), where n is the list size. For 65,536 subnets, it takes at most ~16 steps (log₂(65,536) ≈ 16).

  • Binary search is exponentially faster than linear search for large datasets.

import sys
import time
from ipaddress import ip_address

def binary_search(arr, target):
    arr = sorted(arr)  # Ensure the list is sorted
    start, end = 0, len(arr) - 1
    while start <= end:
        mid = (start + end) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            start = mid + 1
        else:
            end = mid - 1
    return -1

# Read subnets and convert to integers
with open(sys.argv[1], "r", encoding="utf-8") as f:
    data = [str(ip_address(line.strip())) for line in f]

# Get target subnet
target = str(ip_address(input("Enter a /24 subnet (e.g., 10.80.40.0): ")))

start_time = time.time()
index = binary_search(data, target)
if index != -1:
    print(f"Found {ip_address(target)} with binary search")
else:
    print(f"Subnet {ip_address(target)} not found")
print(f"Time: {time.time() - start_time:.4f} seconds")

Performance Comparison

To illustrate the difference, let’s test both algorithms on our subnet datasets. Here are results for searching 10.255.255.0 (worst case for linear search in ordered_ip.txt):

  • Linear Search: ~0.008 seconds (65,536 checks).

  • Binary Search: ~0.0001 seconds (~16 checks).

For a random subnet like 10.80.40.0 in random_ip.txt:

  • Linear Search: ~0.004 seconds (average case).

  • Binary Search: ~0.0001 seconds (sorting overhead included).

The chart below visualizes the number of steps required as the dataset size grows:

Practical Application in Network Automation

Imagine you’re automating a task to find a specific subnet in a routing table or IPAM system. Linear search works for small lists but becomes impractical for thousands of subnets. Binary search, with its O(log n) efficiency, is ideal for large-scale network data, especially if the data is pre-sorted (common in databases or IPAM tools).

Example Use Case: You’re validating whether a subnet exists in a network’s inventory before assigning it to a new VLAN. Binary search ensures quick lookups, minimizing script runtime and improving automation efficiency.

Interactive Challenge

Try this hands-on exercise to see the algorithms in action:

  1. Save the subnet generation script as subnets.py and run it to create the three files.

  2. Combine the linear and binary search scripts into one file (search_subnets.py):

import sys
import time
from ipaddress import ip_address

def linear_search(arr, target):
    for i, prefix in enumerate(arr):
        if prefix == target:
            return i
    return -1

def binary_search(arr, target):
    arr = sorted(arr)
    start, end = 0, len(arr) - 1
    while start <= end:
        mid = (start + end) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            start = mid + 1
        else:
            end = mid - 1
    return -1

# Read subnets
with open(sys.argv[1], "r", encoding="utf-8") as f:
    data = [str(ip_address(line.strip())) for line in f]

# Get target subnet
target = str(ip_address(input("Enter a /24 subnet (e.g., 10.80.40.0): ")))

# Linear search
start_time = time.time()
index = linear_search(data, target)
print(f"Linear Search: {'Found' if index != -1 else 'Not found'} {ip_address(target)}")
print(f"Time: {time.time() - start_time:.4f} seconds")

# Binary search
start_time = time.time()
index = binary_search(data, target)
print(f"Binary Search: {'Found' if index != -1 else 'Not found'} {ip_address(target)}")
print(f"Time: {time.time() - start_time:.4f} seconds")

Run the script with different files and subnets:

python search_subnets.py ordered_ip.txt

Try searching for 10.0.0.0 (best case), 10.255.255.0 (worst case), and a random subnet like 10.80.40.0.

Challenge: Modify the script to count the number of comparisons each algorithm makes and print them. This will help you see the efficiency gap firsthand!

;