Skip to content

A curated task graph benchmark for scheduling research — 82 DAGs across 25 application domains with honest provenance.

License

Notifications You must be signed in to change notification settings

ANRGUSC/dagbench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DAGBench

A curated task graph benchmark for scheduling research — 82 DAGs across 22 application domains with provenance.

DAGBench provides 82 diverse, well-characterized task graphs (DAGs) from 22 application domains, all in a uniform SAGA-compatible format with full provenance tracking. Every workflow has verified provenance — either genuinely sourced from code repositories, algorithmically generated, faithfully extracted from published paper figures, or clearly labeled as AI-generated with attribution. Load any workflow in one line and schedule it with any of SAGA's 17 algorithms.

Browse the catalog: Download index.html and open it in your browser for an interactive overview of all 82 workflows, domain distribution, provenance policy, and full bibliography.

Installation

pip install dagbench

Note: DAGBench depends on SAGA (anrg-saga), which has a non-commercial license. DAGBench itself is Apache-2.0.

Quick Start

import dagbench

# List all available workflows
for wf in dagbench.list_workflows():
    print(f"{wf.id}: {wf.graph_stats.num_tasks} tasks, depth={wf.graph_stats.depth}")

# Load a workflow as a SAGA ProblemInstance
instance = dagbench.load_workflow("classic.gauss_elim_5")

# Schedule it with HEFT
from saga.schedulers.heft import HeftScheduler
schedule = HeftScheduler().schedule(instance.network, instance.task_graph)
print(f"Makespan: {schedule.makespan:.2f}")

# Search workflows by domain
iot_workflows = dagbench.search(domain="iot")

# Get detailed metadata
meta = dagbench.load_metadata("edge.autonomous_driving")
print(f"Source: {meta.provenance.source}")
print(f"Method: {meta.provenance.extraction_method}")

CLI

# List all workflows
dagbench list

# Filter by domain
dagbench list --domain iot

# Show detailed info about a workflow
dagbench info classic.cholesky_4

# Validate all workflows
dagbench validate

# Aggregate statistics
dagbench stats

# Generate HTML documentation
dagbench docs                    # all workflows
dagbench docs classic.fft_8     # single workflow

Provenance Policy

Every workflow in DAGBench has verified provenance. We use four extraction methods:

Method Count Description
manual-figure 13 Faithfully extracted from published paper figures or open-source simulator code with per-task costs from the source
programmatic 23 Genuinely sourced from code repositories (RIoTBench from SAGA) or algorithmically generated (classic benchmarks)
generated 17 Algorithmically generated synthetic patterns (fork-join, stencil, random layered, etc.)
ai-generated 29 AI-generated task graphs inspired by domain literature. Each cites its inspiration source with notes explaining the relationship

The 13 manual-figure workflows are faithfully extracted from specific figures in published papers or open-source simulator code, with task costs derived from the source's own measurements or proportional estimates. Each workflow's metadata.yaml documents exactly which figure/code was extracted and what assumptions were made.

Workflows labeled ai-generated are NOT extracted from paper figures. They are synthetic structures designed to represent typical patterns in their respective domains. Where papers are cited, the notes field explains exactly what the paper contains and how the workflow relates to it. No DOI has been fabricated — all cited DOIs have been verified.

Each workflow also includes an HTML documentation page (docs.html) with a Mermaid.js DAG visualization, per-task descriptions, and full provenance details.

Relationship to SAGA

10 of 82 workflows are direct snapshots from SAGA's built-in generators, exported verbatim via scripts/import_from_saga.py:

Workflow ID SAGA Source
iot.riotbench_etl saga.schedulers.data.riotbench.get_etl_task_graphs()
iot.riotbench_stats saga.schedulers.data.riotbench.get_stats_task_graphs()
iot.riotbench_train saga.schedulers.data.riotbench.get_train_task_graphs()
iot.riotbench_predict saga.schedulers.data.riotbench.get_predict_task_graphs()
synthetic.diamond saga.utils.random_graphs.get_diamond_dag()
synthetic.chain_4 saga.utils.random_graphs.get_chain_dag(4)
synthetic.chain_8 saga.utils.random_graphs.get_chain_dag(8)
synthetic.fork saga.utils.random_graphs.get_fork_dag()
synthetic.branching_3x2 saga.utils.random_graphs.get_branching_dag(3, 2)
synthetic.branching_4x3 saga.utils.random_graphs.get_branching_dag(4, 3)

These workflows carry extraction_method: programmatic in their metadata. If you are benchmarking against SAGA's own schedulers, be aware that these graphs already exist inside SAGA itself.

The 4 scientific.*_like workflows (e.g. scientific.montage_like, scientific.epigenomics_like) share family names with WfCommons/Pegasus but are independently AI-generated at different scale and structure -- they are NOT imported from WfCommons or SAGA. See docs/sources.md for full details.

Workflow Catalog

By Domain (82 workflows)

Domain Count Examples
Synthetic Patterns 23 Diamond, chain, fork-join, branching, stencil, reduction tree, wide parallel, random layered (small to XXL), one-task, chain-2
Edge Computing 10 SplitStream pipeline, face analysis, Loki traffic, M-TEC (LightGBM, video analytics, matrix ops), ML surveillance
Classic Benchmarks 13 Gaussian elimination, FFT butterfly, LU decomposition, Cholesky, MapReduce
IoT / Sensor Networks 5 RIoTBench (ETL, stats, train, predict), predictive maintenance
Scientific Workflows 4 Montage, Epigenomics, Seismology, BLAST
Fog Computing 3 Federated fog, healthcare, smart building
MEC (SLEIPNIR) 5 Navigator, antivirus, face recognizer, Facebook, chess — from open-source SLEIPNIR simulator
Agriculture IoT 5 FlockFocus feeder pipeline, precision irrigation, livestock monitoring, greenhouse, crop disease
Industrial IoT 4 Robotic assembly, CNC monitoring, digital twin, energy monitoring
Smart City 4 Traffic management, waste, air quality, water distribution
V2X / Vehicular 2 Cooperative perception, intersection management
ML Pipelines 1 Federated learning
UAV / Drone 1 Search & rescue
MEC Offloading 1 Video analytics
NFV 1 5G UE attach

By Size

Range Count Examples
1-10 tasks 24 One-task (1), chain-2 (2), Loki traffic (3), diamond (4), chain (4-8), SLEIPNIR antivirus (5), SplitStream (6), SLEIPNIR navigator (9), random-small (10-12)
11-15 tasks 25 CNC monitoring (11), crop disease (11), greenhouse control (12), random-medium (12-36)
16-20 tasks 11 Robotic assembly (16), BLAST-like (19), search rescue (20), SLEIPNIR chess (20)
21-50 tasks 14 MapReduce-16m (27), Cholesky-5 (35), branching-4x3 (41)
51-100 tasks 5 Gauss-10 (55), Cholesky-6 (56), random-large-dense (57), FFT-16 (64), random-large-balanced (87)
101-200 tasks 2 FFT-32 (144), random-xlarge (157)
1000+ tasks 1 Random-XXL (1118)

Complete Workflow List

ID Tasks Edges Depth Width Name
agri.crop_disease 11 12 7 3 Crop Disease Detection
agri.flockfocus_feeder 5 4 4 2 FlockFocus Feeder Monitoring Pipeline
agri.greenhouse_control 12 14 5 5 Greenhouse Climate Control
agri.livestock_monitoring 14 13 3 10 Livestock Monitoring
agri.precision_irrigation 14 13 5 9 Precision Irrigation
classic.cholesky_4 20 26 10 6 Cholesky Factorization (4x4 tiles)
classic.cholesky_5 35 50 13 10 Cholesky Factorization (5x5 tiles)
classic.cholesky_6 56 85 16 15 Cholesky Factorization (6x6 tiles)
classic.fft_8 28 32 5 8 FFT Butterfly (8-point)
classic.fft_16 64 80 6 16 FFT Butterfly (16-point)
classic.fft_32 144 192 7 32 FFT Butterfly (32-point)
classic.gauss_elim_5 15 30 9 4 Gaussian Elimination (n=5)
classic.gauss_elim_7 28 63 13 6 Gaussian Elimination (n=7)
classic.gauss_elim_10 55 135 19 9 Gaussian Elimination (n=10)
classic.lu_decomp_4 30 49 10 9 LU Decomposition (4x4 tiles)
classic.mapreduce_4m_2r 9 12 5 4 MapReduce (4 mappers, 2 reducers)
classic.mapreduce_8m_4r 15 24 5 8 MapReduce (8 mappers, 4 reducers)
classic.mapreduce_16m_8r 27 48 5 16 MapReduce (16 mappers, 8 reducers)
edge.autonomous_driving 11 10 7 3 Autonomous Driving Pipeline
edge.face_analysis_pipeline 6 7 4 3 Pipeline-Parallel Face Analysis System
edge.loki_traffic_pipeline 3 2 2 2 Loki Traffic Analysis Pipeline
edge.ml_surveillance_pipeline 7 6 3 3 Multi-Model ML Surveillance Pipeline
edge.mtec_lightgbm 6 8 4 3 M-TEC LightGBM Training Pipeline
edge.mtec_matrix_ops 6 7 4 2 M-TEC Matrix Operations Pipeline
edge.mtec_video_analytics 7 8 5 2 M-TEC Video Analytics Pipeline
edge.smart_home 12 13 5 4 Smart Home IoT Pipeline
edge.splitstream_pipeline 6 6 4 2 SplitStream Video Query Pipeline
edge.video_transcoding 14 16 5 4 Video Transcoding Pipeline
fog.federated_fog 23 26 7 5 Federated Fog Learning
fog.healthcare_fog 17 16 5 9 Fog Healthcare Monitoring
fog.smart_building 21 22 5 12 Fog Smart Building Management
iiot.cnc_monitoring 11 12 5 4 IIoT CNC Machine Monitoring
iiot.digital_twin 13 14 7 5 IIoT Digital Twin Synchronization
iiot.energy_monitoring 13 14 4 8 IIoT Factory Energy Monitoring
iiot.robotic_assembly 16 17 10 3 IIoT Robotic Assembly Coordination
iot.predictive_maintenance 13 15 6 5 Industrial Predictive Maintenance
iot.riotbench_etl 11 11 10 2 RIoTBench ETL Pipeline
iot.riotbench_predict 11 14 7 3 RIoTBench ML Prediction Pipeline
iot.riotbench_stats 9 10 7 3 RIoTBench Statistical Analytics
iot.riotbench_train 10 11 8 2 RIoTBench ML Training Pipeline
mec.sleipnir_antivirus 5 5 4 2 SLEIPNIR Antivirus App
mec.sleipnir_chess 20 19 20 1 SLEIPNIR Chess App (5 Moves)
mec.sleipnir_facebook 7 7 7 1 SLEIPNIR Facebook App
mec.sleipnir_facerecognizer 5 5 5 1 SLEIPNIR Face Recognizer App
mec.sleipnir_navigator 9 13 7 2 SLEIPNIR Navigator App
mec.video_analytics 13 14 8 3 MEC Video Analytics Pipeline
ml.federated_learning 19 22 7 5 Federated Learning Round (5 clients)
nfv.ue_attach 12 15 8 4 5G UE Attach Procedure
scientific.blast_like 19 22 7 5 BLAST-like Sequence Search Workflow
scientific.epigenomics_like 19 21 7 4 Epigenomics-like Genome Workflow
scientific.montage_like 19 29 7 6 Montage-like Astronomy Workflow
scientific.seismology_like 24 31 6 9 Seismology-like Hazard Workflow
smart_city.air_quality 15 15 5 5 Air Quality Monitoring
smart_city.traffic_mgmt 17 16 6 6 Smart City Traffic Management
smart_city.waste_mgmt 14 13 7 8 IoT Waste Management
smart_city.water_distribution 16 15 4 6 Water Distribution Network
synthetic.branching_3x2 8 10 4 4 Branching DAG (3 levels, factor 2)
synthetic.branching_4x3 41 66 5 27 Branching DAG (4 levels, factor 3)
synthetic.chain_2 2 1 2 1 Chain DAG (2 nodes)
synthetic.chain_4 4 3 4 1 Chain DAG (4 nodes)
synthetic.chain_8 8 7 8 1 Chain DAG (8 nodes)
synthetic.diamond 4 4 3 2 Diamond DAG
synthetic.fork 6 6 4 2 Fork-Join DAG
synthetic.multi_fork_join 14 19 8 4 Multi-Level Fork-Join DAG
synthetic.one_task 1 0 1 1 Single Task DAG
synthetic.pipeline_stages 10 17 6 3 Multi-Stage Pipeline DAG
synthetic.random_large_balanced 87 546 12 12 Random Large Balanced DAG
synthetic.random_large_dense 57 514 10 9 Random Large Dense DAG
synthetic.random_medium_balanced 30 153 8 6 Random Medium Balanced DAG
synthetic.random_medium_comm 32 165 8 6 Random Medium Comm-Heavy DAG
synthetic.random_medium_compute 25 118 8 6 Random Medium Compute-Heavy DAG
synthetic.random_medium_deep 33 147 14 3 Random Medium Deep DAG
synthetic.random_small_narrow 10 21 6 3 Random Small Narrow DAG
synthetic.random_small_wide 12 39 4 6 Random Small Wide DAG
synthetic.random_xlarge 157 1070 17 14 Random Extra-Large DAG
synthetic.random_xxlarge 1118 8450 22 70 Random XXL DAG
synthetic.reduction_tree 15 14 4 8 Binary Reduction Tree (8 leaves)
synthetic.stencil_3x4 12 17 6 3 Stencil Computation (3x4 grid)
synthetic.wide_parallel_20 22 40 3 20 Wide Parallel DAG (20 workers)
uav.search_rescue 20 21 10 3 UAV Search and Rescue Coordination
v2x.cooperative_perception 15 14 8 6 V2X Cooperative Perception
v2x.intersection_mgmt 15 18 8 4 V2X Intersection Management

Data Format

Each workflow consists of:

  • graph.json — SAGA ProblemInstance (task graph + network topology)
  • metadata.yaml — Full provenance, license, statistics, domain tags
  • docs.html — Self-contained HTML documentation with DAG visualization
# graph.json structure
{
  "name": "iot.riotbench_etl",
  "task_graph": {
    "tasks": [{"name": "Source", "cost": 32.5}, ...],
    "dependencies": [{"source": "Source", "target": "Parse", "size": 1024.0}, ...]
  },
  "network": {
    "nodes": [{"name": "Edge0", "speed": 1.0}, ...],
    "edges": [{"source": "Edge0", "target": "Fog0", "speed": 7500.0}, ...]
  }
}

Generators

Generate DAGs programmatically:

from dagbench.generators import (
    gaussian_elimination_dag, fft_dag, cholesky_dag,
    lu_decomposition_dag, mapreduce_dag, random_layered_dag,
)

# Classic benchmarks
tg = gaussian_elimination_dag(n=10)
tg = fft_dag(num_points=64)
tg = cholesky_dag(n=8)

# Random layered DAG (configurable structure + costs)
tg = random_layered_dag(depth=6, width_min=3, width_max=8, ccr=0.5, seed=42)

Backward compat: from dagbench.converters import gaussian_elimination_dag still works.

Converters

Import DAGs from external formats:

from dagbench.converters import parse_stg, parse_dot, parse_edge_list_csv, DAGBuilder

# STG format (Standard Task Graph Set)
tg = parse_stg(stg_text)

# DOT format (Graphviz)
tg = parse_dot(dot_text)

# CSV edge list
tg = parse_edge_list_csv(csv_text)

# Fluent builder for paper figures
tg = (DAGBuilder("my_dag")
    .task("A", 10.0).task("B", 20.0).task("C", 15.0)
    .edge("A", "B", 5.0).edge("A", "C", 3.0)
    .build())

Network Presets

from dagbench.networks import fog_network, homogeneous_network, star_network, mec_network, manet_network

# 3-tier IoT fog topology
net = fog_network(num_edge=10, num_fog=3, num_cloud=2)

# Homogeneous fully-connected
net = homogeneous_network(num_nodes=4, speed=1.0, bandwidth=100.0)

# Star topology
net = star_network(num_edge=6, hub_speed=10.0)

# MEC: UE devices + MEC servers + cloud
net = mec_network(num_ue=4, num_mec=2)

# MANET/tactical mesh: command node + mobile peers
net = manet_network(num_nodes=6)

Contributing

See docs/contributing.md for how to add new workflows.

License

  • DAGBench package + workflow data: Apache-2.0
  • SAGA dependency: SAGA uses a custom non-commercial academic license that prohibits commercial use and requires citing ANRG/USC in publications. DAGBench imports but does not redistribute SAGA. See SAGA's LICENSE for details.
  • Per-workflow source licenses tracked in each metadata.yaml

Citation

If you use DAGBench in your research, please cite:

@software{dagbench2026,
  title={DAGBench: A Task Graph Benchmark Repository for Scheduling Research},
  year={2026},
  url={https://github.com/ANRGUSC/dagbench}
}

See also the SAGA/PISA paper that motivated this work:

@article{coleman2024pisa,
  title={PISA: An Adversarial Approach To Comparing Task Graph Scheduling Algorithms},
  author={Coleman, Jared and Krishnamachari, Bhaskar},
  journal={arXiv preprint arXiv:2403.07120},
  year={2024}
}

About

A curated task graph benchmark for scheduling research — 82 DAGs across 25 application domains with honest provenance.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •