This guide walks through the core features of dispenso with working examples. Each section includes a complete, compilable example that you can build and run.
See the README for installation instructions. Dispenso requires C++14 and CMake 3.12+.
To build the examples:
mkdir build && cd build
cmake .. -DDISPENSO_BUILD_EXAMPLES=ON
makeAt the heart of dispenso is the ThreadPool. A thread pool manages a set of
worker threads that execute tasks. You can use the global thread pool or create
your own:
#include <dispenso/thread_pool.h>
// Use the global thread pool (recommended for most cases)
dispenso::ThreadPool& pool = dispenso::globalThreadPool();
// Or create a custom pool with a specific number of threads
dispenso::ThreadPool myPool(4); // 4 worker threadsNote:
globalThreadPool()defaults tostd::thread::hardware_concurrency() - 1worker threads, since the calling thread typically participates in computation. Usedispenso::resizeGlobalThreadPool(n)to change it.
A TaskSet groups related tasks and provides a way to wait for their completion:
#include <dispenso/task_set.h>
dispenso::TaskSet taskSet(dispenso::globalThreadPool());
taskSet.schedule([]() { /* task 1 */ });
taskSet.schedule([]() { /* task 2 */ });
taskSet.wait(); // Block until all tasks completeThe simplest way to parallelize work is with parallel_for. It distributes
loop iterations across available threads.
Simple per-element parallel loop:
#include <dispenso/parallel_for.h>
// Process each element independently in parallel
dispenso::parallel_for(0, kArraySize, [&](size_t i) { output[i] = std::sqrt(input[i]); });Reduction with per-thread state:
std::vector<double> partialSums;
dispenso::parallel_for(
partialSums,
[]() { return 0.0; }, // State initializer
size_t{0},
kArraySize,
[&](double& localSum, size_t start, size_t end) {
for (size_t i = start; i < end; ++i) {
localSum += input[i];
}
});
// Combine partial sums
double totalSum = 0.0;
for (double partial : partialSums) {
totalSum += partial;
}See full example.
Key points:
- Use the simple form for independent per-element work
- Use chunked ranges when you want to control work distribution
- Per-thread state enables efficient reductions
- Options let you control parallelism and chunking strategy
When you have a container rather than an index range, use for_each:
Parallel for_each on a vector:
#include <dispenso/for_each.h>
std::vector<double> values = {1.0, 4.0, 9.0, 16.0, 25.0, 36.0, 49.0, 64.0};
// Apply square root to each element in parallel
dispenso::for_each(values.begin(), values.end(), [](double& val) { val = std::sqrt(val); });for_each_n with explicit count:
std::vector<int> partial = {10, 20, 30, 40, 50, 60, 70, 80, 90, 100};
// Only process first 5 elements
dispenso::for_each_n(partial.begin(), 5, [](int& n) { n += 100; });See full example.
Key points:
- Works with any iterator type (including non-random-access iterators)
for_each_ntakes an explicit count- Pass a
TaskSetfor external synchronization control
For more complex task patterns, use TaskSet and ConcurrentTaskSet directly:
Basic TaskSet:
#include <dispenso/task_set.h>
dispenso::TaskSet taskSet(dispenso::globalThreadPool());
std::atomic<int> counter(0);
for (int i = 0; i < 10; ++i) {
taskSet.schedule([&counter, i]() { counter.fetch_add(i, std::memory_order_relaxed); });
}
taskSet.wait();ConcurrentTaskSet with nested scheduling:
dispenso::ConcurrentTaskSet taskSet(dispenso::globalThreadPool());
std::atomic<int> total(0);
for (int i = 0; i < 5; ++i) {
taskSet.schedule([&taskSet, &total, i]() {
// Each task schedules two sub-tasks
for (int j = 0; j < 2; ++j) {
taskSet.schedule(
[&total, i, j]() { total.fetch_add(i * 10 + j, std::memory_order_relaxed); });
}
});
}
taskSet.wait();See full example.
Key points:
TaskSetis for single-threaded schedulingConcurrentTaskSetallows scheduling from multiple threads- Both support cancellation for cooperative early termination
- The destructor waits for all tasks to complete
When you need return values from async operations, use Future:
Basic async and get:
#include <dispenso/future.h>
dispenso::Future<int> future = dispenso::async([]() {
int result = 0;
for (int i = 1; i <= 100; ++i) {
result += i;
}
return result;
});
int result = future.get(); // blocks until readyChaining with then():
dispenso::Future<double> chainedFuture = dispenso::async([]() {
return 16.0;
})
.then([](dispenso::Future<double>&& prev) {
return std::sqrt(prev.get());
})
.then([](dispenso::Future<double>&& prev) {
return prev.get() * 2.0;
});when_all for multiple futures:
dispenso::Future<int> f1 = dispenso::async([]() { return 10; });
dispenso::Future<int> f2 = dispenso::async([]() { return 20; });
dispenso::Future<int> f3 = dispenso::async([]() { return 30; });
auto allFutures = dispenso::when_all(std::move(f1), std::move(f2), std::move(f3));
auto tuple = allFutures.get();
int sum = std::get<0>(tuple).get() + std::get<1>(tuple).get() + std::get<2>(tuple).get();See full example.
Key points:
async()launches work and returns aFuturethen()chains dependent computationswhen_all()waits for multiple futuresmake_ready_future()creates an already-completed future
For complex dependency patterns, build a task graph:
Diamond dependency pattern:
#include <dispenso/graph.h>
#include <dispenso/graph_executor.h>
// A
// / \
// B C
// \ /
// D
dispenso::Graph graph;
dispenso::Node& A = graph.addNode([&]() { r[0] = 1.0f; });
dispenso::Node& B = graph.addNode([&]() { r[1] = r[0] * 2.0f; });
dispenso::Node& C = graph.addNode([&]() { r[2] = r[0] + 5.0f; });
dispenso::Node& D = graph.addNode([&]() { r[3] = r[1] + r[2]; });
B.dependsOn(A);
C.dependsOn(A);
D.dependsOn(B, C);
setAllNodesIncomplete(graph);
dispenso::ConcurrentTaskSet taskSet(dispenso::globalThreadPool());
dispenso::ConcurrentTaskSetExecutor executor;
executor(taskSet, graph);See full example.
Key points:
- Use
dependsOn()to specify prerequisites - Multiple executors available: single-thread, parallel_for, ConcurrentTaskSet
- Graphs can be re-executed after calling
setAllNodesIncomplete() - Subgraphs help organize large graphs
For streaming data through stages, use pipelines:
3-stage pipeline (generator -> transform -> sink):
#include <dispenso/pipeline.h>
std::vector<int> results;
int counter = 0;
dispenso::pipeline(
// Stage 1: Generator - produces values
[&counter]() -> dispenso::OpResult<int> {
if (counter >= 10) {
return {}; // Empty result signals end of input
}
return counter++;
},
// Stage 2: Transform - squares the value
[](int value) { return value * value; },
// Stage 3: Sink - collects results
[&results](int value) { results.push_back(value); });See full example.
Key points:
- Generator stage produces values (returns
OpResult<T>orstd::optional<T>) - Transform stages process values (can filter by returning empty result)
- Sink stage consumes final values
- Use
stage()with a limit for parallel stages
A vector that supports concurrent push_back and growth:
Concurrent push_back from multiple threads:
#include <dispenso/concurrent_vector.h>
#include <dispenso/parallel_for.h>
dispenso::ConcurrentVector<int> vec;
dispenso::parallel_for(0, 1000, [&vec](size_t i) { vec.push_back(static_cast<int>(i)); });Iterator stability during concurrent modification:
dispenso::ConcurrentVector<int> vec;
vec.push_back(1);
vec.push_back(2);
vec.push_back(3);
auto it = vec.begin();
int& firstElement = *it;
// Push more elements concurrently
dispenso::parallel_for(0, 100, [&vec](size_t i) { vec.push_back(static_cast<int>(i + 100)); });
// Original iterator and reference are still valid
assert(*it == 1);
assert(firstElement == 1);See full example.
Key points:
- Iterators and references remain stable during growth
- Use
grow_by()for efficient batch insertion - Reserve capacity upfront when size is known
- Not all operations are concurrent-safe (see docs)
A one-shot barrier for thread synchronization:
count_down + wait pattern:
#include <dispenso/latch.h>
constexpr int kNumWorkers = 3;
dispenso::Latch workComplete(kNumWorkers);
std::vector<int> results(kNumWorkers, 0);
dispenso::ConcurrentTaskSet taskSet(dispenso::globalThreadPool());
for (int i = 0; i < kNumWorkers; ++i) {
taskSet.schedule([&workComplete, &results, i]() {
results[static_cast<size_t>(i)] = (i + 1) * 10;
workComplete.count_down(); // Signal work is done (non-blocking)
});
}
workComplete.wait(); // Main thread waits for all workersSee full example.
Key points:
arrive_and_wait()decrements and blockscount_down()decrements without blockingwait()blocks without decrementing- Cannot be reset (one-shot)
Manage expensive-to-create resources with ResourcePool:
Basic buffer pool with RAII:
#include <dispenso/resource_pool.h>
// Create a pool of 4 buffers
dispenso::ResourcePool<Buffer> bufferPool(4, []() { return Buffer(); });
dispenso::parallel_for(0, 100, [&bufferPool](size_t i) {
// Acquire a resource from the pool (blocks if none available)
auto resource = bufferPool.acquire();
// Use the resource
resource.get().process(static_cast<int>(i));
// Resource automatically returned to pool when 'resource' goes out of scope
});See full example.
Key points:
- Resources automatically return to pool when RAII wrapper destructs
acquire()blocks if no resources available- Good for database connections, buffers, etc.
- Can be used to limit concurrency
- Browse the API Reference for complete documentation
- Check out the tests for more usage examples
- See the benchmarks for performance testing patterns