Taskflow helps you quickly write parallel and heterogeneous task programs in modern C++
Taskflow is faster, more expressive, and easier for drop-in integration than many of existing task programming frameworks in handling complex parallel workloads.
Taskflow lets you quickly implement task decomposition strategies that incorporate both regular and irregular compute patterns, together with an efficient work-stealing scheduler to optimize your multithreaded performance.
Taskflow supports conditional tasking for you to make rapid control-flow decisions across dependent tasks to implement cycles and conditions that were otherwise difficult to do with existing tools.
Taskflow is composable. You can create large parallel graphs through composition of modular and reusable blocks that are easier to optimize at an individual scope.
Taskflow supports heterogeneous tasking for you to accelerate a wide range of scientific computing applications by harnessing the power of CPU-GPU collaborative computing.
Taskflow provides visualization and tooling needed for profiling Taskflow programs.
We are committed to support trustworthy developments for both academic and industrial research projects in parallel computing. Check out Who is Using Taskflow and what our users say:
See a quick presentation and visit the documentation to learn more about Taskflow. Technical details can be referred to our IEEE TPDS paper.
Start Your First Taskflow ProgramThe following program (simple.cpp
) creates a taskflow of four tasks A
, B
, C
, and D
, where A
runs before B
and C
, and D
runs after B
and C
. When A
finishes, B
and C
can run in parallel. Try it live on Compiler Explorer (godbolt)!
#include <taskflow/taskflow.hpp> // Taskflow is header-only int main(){ tf::Executor executor; tf::Taskflow taskflow; auto [A, B, C, D] = taskflow.emplace( // create four tasks [] () { std::cout << "TaskA\n"; }, [] () { std::cout << "TaskB\n"; }, [] () { std::cout << "TaskC\n"; }, [] () { std::cout << "TaskD\n"; } ); A.precede(B, C); // A runs before B and C D.succeed(B, C); // D runs after B and C executor.run(taskflow).wait(); return 0; }
Taskflow is header-only and there is no wrangle with installation. To compile the program, clone the Taskflow project and tell the compiler to include the headers.
~$ git clone https://github.com/taskflow/taskflow.git # clone it only once ~$ g++ -std=c++20 examples/simple.cpp -I. -O2 -pthread -o simple ~$ ./simple TaskA TaskC TaskB TaskDVisualize Your First Taskflow Program
Taskflow comes with a built-in profiler, TFProf, for you to profile and visualize taskflow programs in an easy-to-use web-based interface.
# run the program with the environment variable TF_ENABLE_PROFILER enabled ~$ TF_ENABLE_PROFILER=simple.json ./simple ~$ cat simple.json [ {"executor":"0","data":[{"worker":0,"level":0,"data":[{"span":[172,186],"name":"0_0","type":"static"},{"span":[187,189],"name":"0_1","type":"static"}]},{"worker":2,"level":0,"data":[{"span":[93,164],"name":"2_0","type":"static"},{"span":[170,179],"name":"2_1","type":"static"}]}]} ] # paste the profiling json data to https://taskflow.github.io/tfprof/
In addition to execution diagram, you can dump the graph to a DOT format and visualize it using a number of free GraphViz tools.
// dump the taskflow graph to a DOT format through std::cout
taskflow.dump(std::cout);
Express Task Graph Parallelism
Taskflow empowers users with both static and dynamic task graph constructions to express end-to-end parallelism in a task graph that embeds in-graph control flow.
Taskflow supports dynamic tasking for you to create a subflow graph from the execution of a task to perform dynamic parallelism. The following program spawns a task dependency graph parented at task B
.
tf::Task A = taskflow.emplace([](){}).name("A"); tf::Task C = taskflow.emplace([](){}).name("C"); tf::Task D = taskflow.emplace([](){}).name("D"); tf::Task B = taskflow.emplace([] (tf::Subflow& subflow) { tf::Task B1 = subflow.emplace([](){}).name("B1"); tf::Task B2 = subflow.emplace([](){}).name("B2"); tf::Task B3 = subflow.emplace([](){}).name("B3"); B3.succeed(B1, B2); // B3 runs after B1 and B2 }).name("B"); A.precede(B, C); // A runs before B and C D.succeed(B, C); // D runs after B and CIntegrate Control Flow to a Task Graph
Taskflow supports conditional tasking for you to make rapid control-flow decisions across dependent tasks to implement cycles and conditions in an end-to-end task graph.
tf::Task init = taskflow.emplace([](){}).name("init"); tf::Task stop = taskflow.emplace([](){}).name("stop"); // creates a condition task that returns a random binary tf::Task cond = taskflow.emplace( [](){ return std::rand() % 2; } ).name("cond"); init.precede(cond); // creates a feedback loop {0: cond, 1: stop} cond.precede(cond, stop);
Taskflow supports GPU tasking for you to accelerate a wide range of scientific computing applications by harnessing the power of CPU-GPU collaborative computing using Nvidia CUDA Graph.
__global__ void saxpy(size_t N, float alpha, float* dx, float* dy) { int i = blockIdx.x*blockDim.x + threadIdx.x; if (i < n) { y[i] = a*x[i] + y[i]; } } // create a CUDA Graph task tf::Task cudaflow = taskflow.emplace([&]() { tf::cudaGraph cg; tf::cudaTask h2d_x = cg.copy(dx, hx.data(), N); tf::cudaTask h2d_y = cg.copy(dy, hy.data(), N); tf::cudaTask d2h_x = cg.copy(hx.data(), dx, N); tf::cudaTask d2h_y = cg.copy(hy.data(), dy, N); tf::cudaTask saxpy = cg.kernel((N+255)/256, 256, 0, saxpy, N, 2.0f, dx, dy); saxpy.succeed(h2d_x, h2d_y) .precede(d2h_x, d2h_y); // instantiate an executable CUDA graph and run it through a stream tf::cudaGraphExec exec(cg); tf::cudaStream stream; stream.run(exec).synchronize(); }).name("CUDA Graph Task");
Taskflow is composable. You can create large parallel graphs through composition of modular and reusable blocks that are easier to optimize at an individual scope.
tf::Taskflow f1, f2; // create taskflow f1 of two tasks tf::Task f1A = f1.emplace([]() { std::cout << "Task f1A\n"; }) .name("f1A"); tf::Task f1B = f1.emplace([]() { std::cout << "Task f1B\n"; }) .name("f1B"); // create taskflow f2 with one module task composed of f1 tf::Task f2A = f2.emplace([]() { std::cout << "Task f2A\n"; }) .name("f2A"); tf::Task f2B = f2.emplace([]() { std::cout << "Task f2B\n"; }) .name("f2B"); tf::Task f2C = f2.emplace([]() { std::cout << "Task f2C\n"; }) .name("f2C"); tf::Task f1_module_task = f2.composed_of(f1) .name("module"); f1_module_task.succeed(f2A, f2B) .precede(f2C);Launch Asynchronous Tasks
Taskflow supports asynchronous tasking. You can launch tasks asynchronously to dynamically explore task graph parallelism.
tf::Executor executor; // create asynchronous tasks directly from an executor std::future<int> future = executor.async([](){ std::cout << "async task returns 1\n"; return 1; }); executor.silent_async([](){ std::cout << "async task does not return\n"; }); // create asynchronous tasks with dynamic dependencies tf::AsyncTask A = executor.silent_dependent_async([](){ printf("A\n"); }); tf::AsyncTask B = executor.silent_dependent_async([](){ printf("B\n"); }, A); tf::AsyncTask C = executor.silent_dependent_async([](){ printf("C\n"); }, A); tf::AsyncTask D = executor.silent_dependent_async([](){ printf("D\n"); }, B, C); executor.wait_for_all();
The executor provides several thread-safe methods to run a taskflow. You can run a taskflow once, multiple times, or until a stopping criteria is met. These methods are non-blocking with a tf::Future<void>
return to let you query the execution status.
// runs the taskflow once tf::Future<void> run_once = executor.run(taskflow); // wait on this run to finish run_once.get(); // run the taskflow four times executor.run_n(taskflow, 4); // runs the taskflow five times executor.run_until(taskflow, [counter=5](){ return --counter == 0; }); // block the executor until all submitted taskflows complete executor.wait_for_all();Leverage Standard Parallel Algorithms
Taskflow defines algorithms for you to quickly express common parallel patterns using standard C++ syntaxes, such as parallel iterations, parallel reductions, and parallel sort.
tf::Task task1 = taskflow.for_each( // assign each element to 100 in parallel first, last, [] (auto& i) { i = 100; } ); tf::Task task2 = taskflow.reduce( // reduce a range of items in parallel first, last, init, [] (auto a, auto b) { return a + b; } ); tf::Task task3 = taskflow.sort( // sort a range of items in parallel first, last, [] (auto a, auto b) { return a < b; } );
Additionally, Taskflow provides composable graph building blocks for you to efficiently implement common parallel algorithms, such as parallel pipeline.
// create a pipeline to propagate five tokens through three serial stages tf::Pipeline pl(num_parallel_lines, tf::Pipe{tf::PipeType::SERIAL, [](tf::Pipeflow& pf) { if(pf.token() == 5) { pf.stop(); } }}, tf::Pipe{tf::PipeType::SERIAL, [](tf::Pipeflow& pf) { printf("stage 2: input buffer[%zu] = %d\n", pf.line(), buffer[pf.line()]); }}, tf::Pipe{tf::PipeType::SERIAL, [](tf::Pipeflow& pf) { printf("stage 3: input buffer[%zu] = %d\n", pf.line(), buffer[pf.line()]); }} ); taskflow.composed_of(pl) executor.run(taskflow).wait();
To use Taskflow, you only need a compiler that supports C++17:
Taskflow works on Linux, Windows, and Mac OS X.
Although Taskflow supports primarily C++17, you can enable C++20 compilation through -std=c++20
(or /std:c++20
for MSVC) to achieve better performance due to new C++20 features.
Visit our project website and documentation to learn more about Taskflow. To get involved:
We are committed to support trustworthy developments for both academic and industrial research projects in parallel and heterogeneous computing. If you are using Taskflow, please cite the following paper we published at 2021 IEEE TPDS:
More importantly, we appreciate all Taskflow contributors and the following organizations for sponsoring the Taskflow project!
Taskflow is licensed with the MIT License. You are completely free to re-distribute your work derived from Taskflow.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4