Post

Completing the C++ backend for graphs

This week’s release brings a major milestone for PyDataStructs, the integration of a full-fledged C++ backend for a broad set of graph algorithms. With this, users gain access to high-performance implementations of classic algorithms while retaining the same Python-facing API.

1. Full C++ Graph Algorithm Backend

PyDataStructs now supports native C++ implementations of key graph algorithms, including:

a. Kosaraju’s algorithm for strongly connected components (Both Adjacency List and Matrix Versions)

b. Tarjan’s algorithm for strongly connected components (Both Adjacency List and Matrix Versions)

c. Floyd–Warshall algorithm for all-pairs shortest paths (Both Adjacency List and Matrix Versions)

d. Kahn’s algorithm for topological sorting (on DAGs) (Both Adjacency List and Matrix Versions)

e. Edmonds–Karp algorithm and Dinic’s algorithm for maximum flow computations (Adjacency List Version)

Crucially, equivalent unit tests have been added for both adjacency-list and adjacency-matrix graph representations, ensuring correctness across internal graph models.

2. Why This Matters

Performance: Algorithms implemented in C++ immediately benefit from lower-level optimizations, memory control, and elimination of Python overhead making them suitable for large-scale graphs or performance-sensitive workloads.

Unified API: Despite the backend shift, users need not change their code: the public Python API remains the same. This means a smooth upgrade path while gaining speed.

Backend Foundation for Future Enhancements: With the core C++ backend in place, this opens up PydataStructs to a wide variety of futture optimization strategies, including but not limited to llvm and compiler based speedups.

3. What Changed in the Codebase

Pull Request #708 introduces a complete suite of C++ implementations for several classical graph algorithms. Instead of a high-level summary, here is a breakdown of the concrete changes and internal mechanisms added to the backend:

a. Floyd–Warshall for Both Graph Representations

Two full implementations were added:

1
2
3
all_pair_shortest_paths_floyd_warshall_adjacency_list

all_pair_shortest_paths_floyd_warshall_adjacency_matrix

These functions:

  • Parse Python graph objects through PyArg_ParseTupleAndKeywords

  • Construct dist and next dictionaries entirely in C++

  • Populate edge weights by examining:

    1
    2
    3
    
    graph->edges (adjacency list backend)
        
    graph->matrix and graph->edge_weights (adjacency matrix backend)
    
  • Run the Floyd–Warshall triple nested loop with careful handling of:

  • Missing dictionary entries (PyDict_GetItemString)

  • INFINITY for unreachable paths

  • Next-hop pointer reconstruction

  • Output a Python tuple (dist_dict, next_dict)

This implementation mirrors Python semantics exactly while executing the tight inner loops in optimized C++.

b. Kahn’s Topological Sort (List + Matrix)

Two implementations were added:

1
2
3
topological_sort_kahn_adjacency_list

topological_sort_kahn_adjacency_matrix

Key features include:

  • Building in-degree maps directly from the backend node structures

  • Using std::deque to hold nodes with zero in-degree

  • Mutating the adjacency list or adjacency matrix during processing

  • Safely decrementing in-degrees and appending nodes to the result list

  • Detecting cycles and raising a Python ValueError if the graph is not a DAG

  • Both versions preserve semantics while removing C++ graph edges and cleaning up GraphEdge objects using Py_DECREF.

c. Maximum Flow (Edmonds–Karp and Dinic)

The PR adds both classic algorithms with full Python interop:

Edmonds–Karp (Adjacency List)

BFS is implemented in C++ in breadth_first_search_max_flow_adjacency_list

Tracks:

  • parent maps

  • Residual capacities using a Python dictionary (flow_passed)

  • max_flow_edmonds_karp_adjacency_list repeatedly:

  • Calls the BFS routine

  • Augments along the discovered path

  • Updates residual capacities in Python dictionaries

  • Produces the maximum flow as a Python float

Dinic’s Algorithm (Adjacency List)

Uses the same BFS routine in level graph building mode

A dedicated recursive DFS:

  • depth_first_search_max_flow_dinic_adjacency_list

  • Computes blocking flows

  • Pushes residual updates back into Python dicts

  • max_flow_dinic_adjacency_list iterates BFS → DFS phases until exhaustion

This is the first time PyDataStructs has true residual network logic implemented entirely in C++, integrated tightly with Python-accessible state.

d. Extensive Test Additions

The test suite was expanded to verify:

  • Algorithm correctness on both adjacency list and adjacency matrix graphs

  • Equality between Python and C++ outputs

  • Correct handling of edge cases (cycles, unreachable nodes, zero-capacity edges)

  • These tests ensure backend uniformity and prevent future regressions.

5. What’s Next

Now that the C++ backend supports a robust collection of algorithms, the upcoming goals are:

a. Benchmarking: Compare performance against:

  • pure Python implementations

  • earlier C++ extensions

  • NetworkX equivalents

b. Backend diversification:

Building on earlier LLVM work, this backend can now be:

  • JIT-compiled using LLVM

  • Potentially parallelized

  • Extended to GPU offloading in future releases

c. More algorithm coverage: With the infrastructure in place, additions like:

  • push-relabel max flow

  • Johnson’s APSP

  • minimum-cut variants

become much easier.

6. Conclusion

This PR represents one of a very substantial backend upgrade in PyDataStructs. By implementing SCC algorithms, APSP, topological sort, and two major max-flow algorithms entirely in C++, this PR transforms PyDataStructs into a genuinely high-performance graph processing library. Users now benefit from:

  • C++-level speed

  • A fully preserved Python API

  • Identical semantics between adjacency list and matrix backends

  • A solid foundation for further optimizations (LLVM, parallelism, GPU)

This post is licensed under CC BY 4.0 by the author.