Season of Commits Summary

Posted Nov 30, 2025 Updated Nov 30, 2025

By Prerak Singh

5 min read

Over the past twelve weeks, my Season of Commits journey with FOSS United has been dedicated to one mission: transforming PyDataStructs into a high-performance, multi-backend data structures and algorithms library.

What began with a simple C++ adjacency list backend eventually grew into:

A complete C++ graph algorithm engine
LLVM accelerated routines
A modern Meson build system
A unified graph API supporting Python, C++, and LLVM
Benchmarks that outperform NetworkX on real workloads

This post summarizes the work across all weeks and reflects on the architectural evolution of PyDataStructs.

1. Building the Foundation C++ Graph Backends and Benchmarking (Weeks 1 to 2)

The first two weeks focused on building high-speed C++ replacements for Python based graph structures.

Adjacency List and Matrix Graphs in C++

Implemented C++ versions of adjacency list and adjacency matrix
Exposed them to Python using the Python C API
Debugged segmentation faults and reference counting issues
Ensured Python like interfaces with native level speed

Benchmarking Framework

I built a benchmarking suite that measured:

Graph construction time
Memory usage
Performance on random graph models like Erdos Renyi, Barabasi Albert, Watts Strogatz

These early benchmarks highlighted that NetworkX still dominated, motivating deeper optimization.

BFS and Better Node Storage

In Week 2 I implemented BFS in C++ for both graph types and switched node data to std::variant to avoid PyObject overhead.
C++ backends now had consistent API compatibility with Python while delivering better performance.

2. Smarter Edges, MSTs, and Real Graph Performance (Weeks 3 to 4)

Rich Edge Representation

Edge storage was reworked to use std::variant<std::monostate, int, double, std::string> allowing fully native handling without Python overhead.

Prim’s Algorithm in C++

Prim’s Minimum Spanning Tree algorithm was implemented directly in the adjacency list backend.
This version used efficient priority queue based logic and gained performance benefits from the new edge variant system.

Benchmarking Prim vs NetworkX

On graphs up to 10,000 nodes, the C++ Prim’s implementation:

Outperformed NetworkX
Achieved up to 2.3x speedup
Demonstrated strong performance for both sparse and dense configurations

Arbitrary Python Data in Nodes

Nodes were extended to support arbitrary PyObject storage, greatly increasing flexibility for user applications without breaking compatibility.

3. Tackling Performance Bottlenecks Dijkstra and Node ID Redesign (Week 5)

Dijkstra’s Algorithm in C++

I added a C++ implementation of Dijkstra’s shortest path algorithm using the adjacency list backend.

Benchmarking Results

While fast on small graphs, scaling issues appeared for larger graphs. After analysis, we identified the bottleneck:
using std::string as the internal node identifier.

String hashing and comparisons caused significant slowdown, unlike integer based indexing used internally by NetworkX.

This led to the upcoming redesign of the graph representation to support internal integer node IDs while keeping external Python facing APIs unchanged.

4. Enter LLVM Accelerated Algorithms and JIT Compilation (Weeks 6 to 8)

LLVM Backend for Sorting

Bubble Sort was implemented with llvmlite to validate the complete LLVM compilation pipeline.
This included generating LLVM IR, running optimization passes, and producing JIT compiled machine code.

LLVM Optimized Bubble Sort

Using vectorization and O3 optimizations, LLVM delivered:

Hundreds of times faster performance than Python
Major speedups over C++ as well
Increased gains for larger input sizes

LLVM for Graphs

Week 8 introduced LLVM backed adjacency list graphs.
This involved:

Designing LLVM level function signatures
Preserving memory layout compatibility across Python, C++, and LLVM
Prototype support for graph operations via JIT compiled IR

This was a major architectural step towards LLVM optimized graph algorithms.

5. Modernizing the Build System Migration to Meson (Week 9)

To support the growing native codebase and LLVM features, PyDataStructs shifted from setuptools to Meson.

Why Meson

Faster incremental builds
First class support for C and C++
Cleaner integration with LLVM and future GPU backends
More consistent cross platform behavior

The minimum Python version was raised to 3.9 to avoid compatibility issues with collections.abc imports.

Stability Fixes

Week 9 also fixed:

TypeErrors in graph node interop
C++ module inconsistencies discovered during Meson builds
Mismatches in Python to C++ type conversions

This stabilized the foundation for upcoming backend expansions.

6. Expanding the C++ Backend DFS and Bellman Ford (Week 10)

Depth First Search in C++

Implemented a stack based iterative DFS
Added callback support for custom user logic
Ensured compatibility for both adjacency list and matrix graphs
Matched the Python traversal order for full consistency

Bellman Ford with SPFA Optimization

Implemented Bellman Ford with SPFA acceleration, supporting:

Negative weights
Negative cycle detection
Optional target node optimization
Python compatible distance and predecessor outputs

Full test coverage ensured correctness across both Python and C++ backends.

7. Completing the C++ Graph Backend (Weeks 11 and 12)

The final milestone arrived in PR 708 which delivered a complete C++ implementation of the major classical graph algorithms.

Full C++ Algorithm Coverage

Implemented for both adjacency list and adjacency matrix:

Kosaraju’s SCC
Tarjan’s SCC
Floyd Warshall all pairs shortest paths
Kahn’s topological sorting
Edmonds Karp maximum flow
Dinic’s maximum flow

Why This Matters

Massive performance improvements due to native computation
Unified Python API that works across Python, C++, and LLVM
Clean separation between backend implementation and frontend usability
Foundation for LLVM accelerated and parallel algorithms

Codebase Enhancements

Implemented Floyd Warshall variants for list and matrix backends
Fully implemented Kahn’s topological sort
Complete residual network logic for maximum flow
Extensive test suite additions verifying correctness and cross backend parity

This milestone elevates PyDataStructs into a high performance graph analytics library with forward looking support for JIT compilation and potential GPU integration.

Acknowledgment

I would like to thank FOSS United for funding this project and for enabling contributors like me to work on meaningful open source engineering. Their support made this entire journey and all of these improvements possible. A huge thanks also to the founder of the PyDataStructs library and my mentor throughought the project, Mr. Gagandeep Singh. This wouldn’t have been possible without his persistent guidance and ideas which were hugely influential in shaping everything up.

Conclusion

This Season of Commits journey transformed PyDataStructs from a Python only graph data structures library into a multi backend computational engine featuring:

High performance C++ graph backends
LLVM accelerated routines
A modern Meson build system
Comprehensive algorithm coverage
Benchmarks that often outperform NetworkX

The project is now ready for its next phase including JIT compiled graph algorithms, SIMD aware execution paths, GPU acceleration and hybrid Python to C++ to LLVM optimization strategies.

Season of Commits

Week 12

This post is licensed under CC BY 4.0 by the author.