Parallel programming has enabled major advances in scientific computing and high‑performance systems. However, writing efficient programs that utilize parallel and heterogeneous hardware remains challenging. Developers often need to manage low‑level concerns such as load balancing, synchronization, scheduling, and performance tuning, which significantly reduces productivity and portability.
Taskflow addresses this challenge by providing a modern C++ task‑parallel programming system that enables developers to express parallelism declaratively while achieving high performance. Taskflow has a rapidly growing user base (over 1.5M downloads) and is increasingly adopted in research and industry.
To further improve Taskflow’s usability and impact, we propose a GSoC project to expand Taskflow with a comprehensive library of task‑parallel algorithm primitives that align with the C++ standard and modern parallel programming practice.
The goal of this project is to design and implement a rich set of task‑parallel algorithm primitives on top of Taskflow, striking a balance among programming productivity, performance, and portability. These primitives will provide users with drop‑in, high‑level building blocks for parallel programming while retaining Taskflow’s flexible execution model.
Parallel algorithm primitives, such as reduce, scan, transform, sort, and partition, form the foundation of many parallel applications. When provided as reusable, well‑optimized building blocks, they allow programmers to parallelize fundamental computations without rewriting complex scheduling logic, managing synchronization, or hand‑tuning performance.
Although the C++17/20 standard introduced parallel algorithms through the <execution> interface, practical implementations remain limited in flexibility and extensibility. Taskflow currently provides only a small subset of parallel algorithms, which does not yet meet the needs of many users building real‑world parallel applications.
This project aims to close this gap by building a full‑featured, Taskflow‑native parallel algorithm library that:
- Covers a broad subset of C++17 standard parallel algorithms
- Integrates naturally with Taskflow’s task graph model
- Achieves competitive performance with established libraries such as Intel TBB
- Serves as a research and teaching platform for modern task‑parallel programming
The project will proceed in several stages:
- Algorithm Coverage and Design: Identify and prioritize a core set of C++17 parallel algorithms (e.g., for_each, transform, reduce, scan, sort, merge, partition). Design Taskflow‑based implementations that expose clean, STL‑like interfaces while leveraging Taskflow’s executor and task graph abstractions.
- Implementation on Taskflow: Implement these algorithms using Taskflow constructs such as dynamic task graphs, pipelines, and work‑stealing scheduling. Emphasis will be placed on:
- Scalability across CPU cores
- Low scheduling and synchronization overhead
- Composability with existing Taskflow workflows
- Performance Evaluation and Benchmarking: Develop a benchmarking suite to evaluate performance across multiple workloads and hardware platforms. Compare results against Intel TBB and standard library implementations in terms of throughput, scalability, and ease of use.
- Toolchain and Ecosystem Integration: Explore integration of the Taskflow parallel algorithm backend with mainstream toolchains (LLVM/GCC) as an alternative execution policy implementation, or as a standalone third‑party backend for standard‑style parallel algorithms.
- Documentation and Examples: Provide high‑quality documentation, tutorials, and example applications demonstrating how these primitives can be composed in real Taskflow workflows.
By the end of the program, the participant is expected to deliver:
- A production‑quality implementation of a core set of parallel algorithm primitives in Taskflow
- Comprehensive benchmarks comparing Taskflow algorithms with Intel TBB and standard implementations
- Integration into the main Taskflow codebase with tests and CI support
- User documentation and tutorial examples
We will encourage the participant to disseminate the results through:
- A technical report or arXiv preprint
- Submissions to relevant parallel computing or systems conferences/workshops
- Presentations at major C++ venues (e.g., CppCon, CppNow)
Participants should have decent C++17/20 programming experience. Basic knowledge about parallelism is preferred.
The Taskflow Team will mentor this project throughout the course of summer code. Participants should expect weekly project meeting to sync up the progress.
We expect a large size (350 hours) for this project because it spans multiple activities, such as implementing algorithms, benchmarking the solutions, and deploying solutions to real use cases.
We rate this project an medium level of difficulty. This project primarily focuses on using Taskflow to implement parallel algorithms and applications, rather than developing the Taskflow core functionalities. Participants in this project will gain much practical and hands-on experience of parallel programming and understand the pros and cons of mainstream parallel programming tools.
Feel free to reach out the tsung-wei.huang at wisc.edu for any questions.