Skip to content

Latest commit

 

History

History
459 lines (320 loc) · 11.6 KB

File metadata and controls

459 lines (320 loc) · 11.6 KB

Getting Started {#getting_started}

This guide walks through the core features of dispenso with working examples. Each section includes a complete, compilable example that you can build and run.

Installation

See the README for installation instructions. Dispenso requires C++14 and CMake 3.12+.

To build the examples:

mkdir build && cd build
cmake .. -DDISPENSO_BUILD_EXAMPLES=ON
make

Basic Concepts

Thread Pools

At the heart of dispenso is the ThreadPool. A thread pool manages a set of worker threads that execute tasks. You can use the global thread pool or create your own:

#include <dispenso/thread_pool.h>

// Use the global thread pool (recommended for most cases)
dispenso::ThreadPool& pool = dispenso::globalThreadPool();

// Or create a custom pool with a specific number of threads
dispenso::ThreadPool myPool(4);  // 4 worker threads

Note: globalThreadPool() defaults to std::thread::hardware_concurrency() - 1 worker threads, since the calling thread typically participates in computation. Use dispenso::resizeGlobalThreadPool(n) to change it.

Task Sets

A TaskSet groups related tasks and provides a way to wait for their completion:

#include <dispenso/task_set.h>

dispenso::TaskSet taskSet(dispenso::globalThreadPool());

taskSet.schedule([]() { /* task 1 */ });
taskSet.schedule([]() { /* task 2 */ });

taskSet.wait();  // Block until all tasks complete

Your First Parallel Loop

The simplest way to parallelize work is with parallel_for. It distributes loop iterations across available threads.

Simple per-element parallel loop:

#include <dispenso/parallel_for.h>

// Process each element independently in parallel
dispenso::parallel_for(0, kArraySize, [&](size_t i) { output[i] = std::sqrt(input[i]); });

Reduction with per-thread state:

std::vector<double> partialSums;
dispenso::parallel_for(
    partialSums,
    []() { return 0.0; }, // State initializer
    size_t{0},
    kArraySize,
    [&](double& localSum, size_t start, size_t end) {
      for (size_t i = start; i < end; ++i) {
        localSum += input[i];
      }
    });

// Combine partial sums
double totalSum = 0.0;
for (double partial : partialSums) {
  totalSum += partial;
}

See full example.

Key points:

  • Use the simple form for independent per-element work
  • Use chunked ranges when you want to control work distribution
  • Per-thread state enables efficient reductions
  • Options let you control parallelism and chunking strategy

Parallel Iteration with for_each

When you have a container rather than an index range, use for_each:

Parallel for_each on a vector:

#include <dispenso/for_each.h>

std::vector<double> values = {1.0, 4.0, 9.0, 16.0, 25.0, 36.0, 49.0, 64.0};

// Apply square root to each element in parallel
dispenso::for_each(values.begin(), values.end(), [](double& val) { val = std::sqrt(val); });

for_each_n with explicit count:

std::vector<int> partial = {10, 20, 30, 40, 50, 60, 70, 80, 90, 100};

// Only process first 5 elements
dispenso::for_each_n(partial.begin(), 5, [](int& n) { n += 100; });

See full example.

Key points:

  • Works with any iterator type (including non-random-access iterators)
  • for_each_n takes an explicit count
  • Pass a TaskSet for external synchronization control

Working with Tasks

For more complex task patterns, use TaskSet and ConcurrentTaskSet directly:

Basic TaskSet:

#include <dispenso/task_set.h>

dispenso::TaskSet taskSet(dispenso::globalThreadPool());
std::atomic<int> counter(0);

for (int i = 0; i < 10; ++i) {
  taskSet.schedule([&counter, i]() { counter.fetch_add(i, std::memory_order_relaxed); });
}

taskSet.wait();

ConcurrentTaskSet with nested scheduling:

dispenso::ConcurrentTaskSet taskSet(dispenso::globalThreadPool());
std::atomic<int> total(0);

for (int i = 0; i < 5; ++i) {
  taskSet.schedule([&taskSet, &total, i]() {
    // Each task schedules two sub-tasks
    for (int j = 0; j < 2; ++j) {
      taskSet.schedule(
          [&total, i, j]() { total.fetch_add(i * 10 + j, std::memory_order_relaxed); });
    }
  });
}

taskSet.wait();

See full example.

Key points:

  • TaskSet is for single-threaded scheduling
  • ConcurrentTaskSet allows scheduling from multiple threads
  • Both support cancellation for cooperative early termination
  • The destructor waits for all tasks to complete

Futures for Async Results

When you need return values from async operations, use Future:

Basic async and get:

#include <dispenso/future.h>

dispenso::Future<int> future = dispenso::async([]() {
  int result = 0;
  for (int i = 1; i <= 100; ++i) {
    result += i;
  }
  return result;
});

int result = future.get();  // blocks until ready

Chaining with then():

dispenso::Future<double> chainedFuture = dispenso::async([]() {
                                           return 16.0;
                                         })
                                             .then([](dispenso::Future<double>&& prev) {
                                               return std::sqrt(prev.get());
                                             })
                                             .then([](dispenso::Future<double>&& prev) {
                                               return prev.get() * 2.0;
                                             });

when_all for multiple futures:

dispenso::Future<int> f1 = dispenso::async([]() { return 10; });
dispenso::Future<int> f2 = dispenso::async([]() { return 20; });
dispenso::Future<int> f3 = dispenso::async([]() { return 30; });

auto allFutures = dispenso::when_all(std::move(f1), std::move(f2), std::move(f3));
auto tuple = allFutures.get();
int sum = std::get<0>(tuple).get() + std::get<1>(tuple).get() + std::get<2>(tuple).get();

See full example.

Key points:

  • async() launches work and returns a Future
  • then() chains dependent computations
  • when_all() waits for multiple futures
  • make_ready_future() creates an already-completed future

Task Graphs

For complex dependency patterns, build a task graph:

Diamond dependency pattern:

#include <dispenso/graph.h>
#include <dispenso/graph_executor.h>

//         A
//        / \
//       B   C
//        \ /
//         D
dispenso::Graph graph;

dispenso::Node& A = graph.addNode([&]() { r[0] = 1.0f; });
dispenso::Node& B = graph.addNode([&]() { r[1] = r[0] * 2.0f; });
dispenso::Node& C = graph.addNode([&]() { r[2] = r[0] + 5.0f; });
dispenso::Node& D = graph.addNode([&]() { r[3] = r[1] + r[2]; });

B.dependsOn(A);
C.dependsOn(A);
D.dependsOn(B, C);

setAllNodesIncomplete(graph);
dispenso::ConcurrentTaskSet taskSet(dispenso::globalThreadPool());
dispenso::ConcurrentTaskSetExecutor executor;
executor(taskSet, graph);

See full example.

Key points:

  • Use dependsOn() to specify prerequisites
  • Multiple executors available: single-thread, parallel_for, ConcurrentTaskSet
  • Graphs can be re-executed after calling setAllNodesIncomplete()
  • Subgraphs help organize large graphs

Pipelines

For streaming data through stages, use pipelines:

3-stage pipeline (generator -> transform -> sink):

#include <dispenso/pipeline.h>

std::vector<int> results;
int counter = 0;

dispenso::pipeline(
    // Stage 1: Generator - produces values
    [&counter]() -> dispenso::OpResult<int> {
      if (counter >= 10) {
        return {}; // Empty result signals end of input
      }
      return counter++;
    },
    // Stage 2: Transform - squares the value
    [](int value) { return value * value; },
    // Stage 3: Sink - collects results
    [&results](int value) { results.push_back(value); });

See full example.

Key points:

  • Generator stage produces values (returns OpResult<T> or std::optional<T>)
  • Transform stages process values (can filter by returning empty result)
  • Sink stage consumes final values
  • Use stage() with a limit for parallel stages

Thread-Safe Containers

ConcurrentVector

A vector that supports concurrent push_back and growth:

Concurrent push_back from multiple threads:

#include <dispenso/concurrent_vector.h>
#include <dispenso/parallel_for.h>

dispenso::ConcurrentVector<int> vec;

dispenso::parallel_for(0, 1000, [&vec](size_t i) { vec.push_back(static_cast<int>(i)); });

Iterator stability during concurrent modification:

dispenso::ConcurrentVector<int> vec;
vec.push_back(1);
vec.push_back(2);
vec.push_back(3);

auto it = vec.begin();
int& firstElement = *it;

// Push more elements concurrently
dispenso::parallel_for(0, 100, [&vec](size_t i) { vec.push_back(static_cast<int>(i + 100)); });

// Original iterator and reference are still valid
assert(*it == 1);
assert(firstElement == 1);

See full example.

Key points:

  • Iterators and references remain stable during growth
  • Use grow_by() for efficient batch insertion
  • Reserve capacity upfront when size is known
  • Not all operations are concurrent-safe (see docs)

Synchronization Primitives

Latch

A one-shot barrier for thread synchronization:

count_down + wait pattern:

#include <dispenso/latch.h>

constexpr int kNumWorkers = 3;
dispenso::Latch workComplete(kNumWorkers);
std::vector<int> results(kNumWorkers, 0);

dispenso::ConcurrentTaskSet taskSet(dispenso::globalThreadPool());

for (int i = 0; i < kNumWorkers; ++i) {
  taskSet.schedule([&workComplete, &results, i]() {
    results[static_cast<size_t>(i)] = (i + 1) * 10;
    workComplete.count_down();  // Signal work is done (non-blocking)
  });
}

workComplete.wait();  // Main thread waits for all workers

See full example.

Key points:

  • arrive_and_wait() decrements and blocks
  • count_down() decrements without blocking
  • wait() blocks without decrementing
  • Cannot be reset (one-shot)

Resource Pooling

Manage expensive-to-create resources with ResourcePool:

Basic buffer pool with RAII:

#include <dispenso/resource_pool.h>

// Create a pool of 4 buffers
dispenso::ResourcePool<Buffer> bufferPool(4, []() { return Buffer(); });

dispenso::parallel_for(0, 100, [&bufferPool](size_t i) {
  // Acquire a resource from the pool (blocks if none available)
  auto resource = bufferPool.acquire();

  // Use the resource
  resource.get().process(static_cast<int>(i));

  // Resource automatically returned to pool when 'resource' goes out of scope
});

See full example.

Key points:

  • Resources automatically return to pool when RAII wrapper destructs
  • acquire() blocks if no resources available
  • Good for database connections, buffers, etc.
  • Can be used to limit concurrency

Next Steps

  • Browse the API Reference for complete documentation
  • Check out the tests for more usage examples
  • See the benchmarks for performance testing patterns