Skip to content

Primer v0.2.0

Latest

Choose a tag to compare

@whisprer whisprer released this 14 Feb 14:59
· 32 commits to main since this release

v0.2.0 — Primer & Full Documentation

The bit-packed prime sieve goes cache-aware. This release adds a segmented sieve variant that processes the range in L1-sized 32 KB windows, complete benchmark harness, and a full documentation suite covering everything from the original C++ origins to production deployment patterns.

Highlights

Primer (seg.rs) — L1-cache-friendly processing in 32 KB segments. Single reused buffer, never leaves cache. 1.4× faster than the flat sieve at n=100M, with 187× less sieve working memory than the primal crate. Zero dependencies, single file, compiles with rustc alone.

rustc -C opt-level=3 -C target-cpu=native seg.rs && ./seg

Benchmark harness (prime_bench.rs) — Head-to-head comparison against primes v0.3 and primal v0.3 across six orders of magnitude (10K–100M). 25 iterations per measurement, cross-verified prime counts, full statistical output (min/median/mean/max/σ).

Bug fixes in the original flat sieve — Two incorrect test assertions that the sieve itself was too correct to trigger:

  • primes[10_000] is 104,743, not 104,729 (off-by-one in zero-indexed lookup)
  • 499,999 is composite, not prime. Largest prime ≤ 500,000 is 499,979

Performance (Median, 25 Iterations)

n wofl flat wofl segmented primal Seg vs flat
500K 434 µs 443 µs 170 µs ~1×
10M 10.3 ms 10.3 ms 3.2 ms ~1× (crossover)
50M 66.6 ms 52.5 ms 30.7 ms 1.27×
100M 189.7 ms 137.0 ms 61.7 ms 1.38×

Building

# Primer (standalone — no Cargo needed)
rustc -C opt-level=3 -C target-cpu=native seg.rs -o seg
./seg

Tests

rustc --test seg.rs -o test_seg && ./test_seg

Benchmark harness (requires Cargo)

cargo new all_bench && cd prime_bench

Copy all_bench.rs → src/main.rs, Cargo.toml in place

cargo run --release

What's Next

  • Wheel-30 factorisation (~1.8× speedup, would close the primal gap to ~15%)
  • Parallel segments via rayon (~N× on N cores, segments are independent)
  • Compile-time generation for fixed small prime tables on embedded targets

Full changelog: CHANGELOG.md

# v0.2.0 — Primer & Full Documentation

The bit-packed prime sieve goes cache-aware. This release adds a segmented sieve variant that processes the range in L1-sized 32 KB windows, complete benchmark harness, and a full documentation suite covering everything from the original C++ origins to production deployment patterns.

Highlights

Primer (seg.rs) — L1-cache-friendly processing in 32 KB segments. Single reused buffer, never leaves cache. 1.4× faster than the flat sieve at n=100M, with 187× less sieve working memory than the primal crate. Zero dependencies, single file, compiles with rustc alone.

rustc -C opt-level=3 -C target-cpu=native seg.rs && ./seg

Benchmark harness (prime_bench.rs) — Head-to-head comparison against primes v0.3 and primal v0.3 across six orders of magnitude (10K–100M). 25 iterations per measurement, cross-verified prime counts, full statistical output (min/median/mean/max/σ).

Bug fixes in the original flat sieve — Two incorrect test assertions that the sieve itself was too correct to trigger:

  • primes[10_000] is 104,743, not 104,729 (off-by-one in zero-indexed lookup)
  • 499,999 is composite, not prime. Largest prime ≤ 500,000 is 499,979

Performance (Median, 25 Iterations)

n wofl flat wofl segmented primal Seg vs flat
500K 434 µs 443 µs 170 µs ~1×
10M 10.3 ms 10.3 ms 3.2 ms ~1× (crossover)
50M 66.6 ms 52.5 ms 30.7 ms 1.27×
100M 189.7 ms 137.0 ms 61.7 ms 1.38×

Memory (n = 50M)

wofl segmented primal Ratio
Sieve working memory 32 KB 6.0 MB 187× smaller
Result vector 24.7 MB 32.0 MB 1.3× smaller
Total 24.8 MB 38.0 MB 1.5× smaller

Flat Sieve Improvements (v0.1.0 → v0.2.0)

  • Pre-allocated result vector via prime-counting upper bound (zero reallocs)
  • Overflow-safe integer isqrt() using checked_mul (correct for all u64 including u64::MAX)
  • Hoisted inner-loop step computation
  • Early termination in collection phase

Documentation

This release includes the full doc suite:

Document Contents
PERFORMANCE_ANALYSIS.md Benchmarks, scaling behaviour, memory deep dive, target use cases
SIDE_BY_SIDE.md Line-by-line C++ → Rust translation guide
BORROW_CHECKER_FIX.md Why the iterator approach was rejected and how explicit loops fix it
writeup.md Full development narrative — from C++ origins through primer
CHANGELOG.md Version history
CONTRIBUTING.md Build/test workflow, performance contribution standards
SECURITY.md Scope, known limitations, reporting

Files

File Description
primer.rs Primer — standalone, zero deps, rustc compilable
og.rs Flat sieve — optimised from v0.1.0
all_bench.rs Benchmark harness (requires Cargo + primes/primal crates)
Cargo.toml Manifest for the benchmark harness
primes.cpp The original 5-line C++ that started it all

Building

# primer (standalone — no Cargo needed)
rustc -C opt-level=3 -C target-cpu=native seg.rs -o seg
./seg

# Tests
rustc --test seg.rs -o test_seg && ./test_seg

# Benchmark harness (requires Cargo)
cargo new og_bench && cd prime_bench
# Copy og_bench.rs → src/main.rs, Cargo.toml in place
cargo run --release

What's Next

  • Wheel-30 factorisation (~1.8× speedup, would close the primal gap to ~15%)
  • Parallel segments via rayon (~N× on N cores, segments are independent)
  • Compile-time generation for fixed small prime tables on embedded targets

Full changelog: [CHANGELOG.md](https://claude.ai/chat/CHANGELOG.md)