Streaming Graph Partitioning (HeiStream)

Original Repository: https://github.com/KaHIP/HeiStream

Overview

HeiStream is a streaming graph partitioning algorithm designed for graphs that are too large to fit entirely in memory. Nodes and their adjacencies are presented sequentially, and each node is assigned to a block upon arrival (or after a bounded buffer delay).

HeiStream requires only O(n + B) memory, where B is the buffer size, compared to O(n + m) for full in-memory methods. It supports three operational modes:

Mode	Description
Fennel	Direct one-pass assignment (`max_buffer_size` = 0 or 1)
BuffCut	Buffered assignment with local optimization (`max_buffer_size` > 1)
Restreaming	Multiple passes over the stream for improved quality (`num_streams_passes` > 1)

CHSZLabLib provides two interfaces: a batch static method and a stateful streaming class.

`Decomposition.stream_partition()`

Partition a graph using HeiStream when the full graph is available in memory.

Signature

Decomposition.stream_partition(
    g: Graph,
    k: int = 2,
    imbalance: float = 3.0,
    seed: int = 0,
    max_buffer_size: int = 0,
    batch_size: int = 0,
    num_streams_passes: int = 1,
    run_parallel: bool = False,
    suppress_output: bool = True,
) -> StreamPartitionResult

Parameters

Parameter	Type	Default	Description
`g`	`Graph`	required	Input graph (undirected, unweighted). Edge weights are ignored.
`k`	`int`	`2`	Number of partitions (must be >= 2).
`imbalance`	`float`	`3.0`	Allowed imbalance in percent (3.0 = 3%).
`seed`	`int`	`0`	Random seed for reproducibility.
`max_buffer_size`	`int`	`0`	Buffer size for BuffCut mode. 0 or 1 = direct Fennel (no buffer). Larger values enable priority-buffer mode.
`batch_size`	`int`	`0`	MLP batch size for model-based partitioning within the buffer. 0 = HeiStream's default.
`num_streams_passes`	`int`	`1`	Number of streaming passes. More passes improve quality at the cost of runtime.
`run_parallel`	`bool`	`False`	Use the parallel 3-thread pipeline (I/O, PQ, partition).
`suppress_output`	`bool`	`True`	Suppress C++ stdout/stderr output.

Returns

StreamPartitionResult

Field	Type	Description
`assignment`	`np.ndarray` (int32)	Partition ID for each node (0-indexed).

Exceptions

Exception	Condition
`ValueError`	`k < 2` or `imbalance < 0`.

Example

from chszlablib import Graph, Decomposition

g = Graph.from_metis("large_graph.graph")
result = Decomposition.stream_partition(g, k=8, imbalance=3.0, num_streams_passes=2)
print(f"Partition assignment: {result.assignment}")

`HeiStreamPartitioner` (Stateful Streaming Class)

For true streaming scenarios where nodes arrive one at a time (e.g., from a network stream or database cursor).

Constructor

HeiStreamPartitioner(
    k: int = 2,
    imbalance: float = 3.0,
    seed: int = 0,
    max_buffer_size: int = 0,
    batch_size: int = 0,
    num_streams_passes: int = 1,
    run_parallel: bool = False,
    suppress_output: bool = True,
)

Methods

Method	Description
`new_node(node, neighbors)`	Add a node with its neighbor list to the stream. Nodes can be added in any order with non-contiguous IDs.
`partition()`	Run HeiStream on all accumulated nodes and return a `StreamPartitionResult`.
`reset()`	Clear all added nodes while retaining configuration.

Example

from chszlablib import HeiStreamPartitioner

hs = HeiStreamPartitioner(k=4, imbalance=3.0, max_buffer_size=1000)
hs.new_node(0, [1, 2])
hs.new_node(1, [0, 3])
hs.new_node(2, [0])
hs.new_node(3, [1])
result = hs.partition()
print(result.assignment)  # array of partition IDs

Performance Disclaimer

This Python interface wraps the HeiStream C++ library via pybind11. While convenient for prototyping and integration into Python workflows, there is inherent overhead from the Python/C++ boundary and data conversion. For maximum performance on large-scale streaming instances, use the original C++ implementation directly from the HeiStream repository.

References

@article{DBLP:journals/jea/FarajS22,
  author  = {Marcelo Fonseca Faraj and Christian Schulz},
  title   = {Buffered Streaming Graph Partitioning},
  journal = {{ACM} J. Exp. Algorithmics},
  volume  = {27},
  pages   = {1.10:1--1.10:26},
  year    = {2022},
  doi     = {10.1145/3546911}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming Graph Partitioning (HeiStream)

Overview

`Decomposition.stream_partition()`

Signature

Parameters

Returns

Exceptions

Example

`HeiStreamPartitioner` (Stateful Streaming Class)

Constructor

Methods

Example

Performance Disclaimer

References

FilesExpand file tree

streaming-partitioning.md

Latest commit

History

streaming-partitioning.md

File metadata and controls

Streaming Graph Partitioning (HeiStream)

Overview

Decomposition.stream_partition()

Signature

Parameters

Returns

Exceptions

Example

HeiStreamPartitioner (Stateful Streaming Class)

Constructor

Methods

Example

Performance Disclaimer

References

`Decomposition.stream_partition()`

`HeiStreamPartitioner` (Stateful Streaming Class)