v0.2.0 — Primer & Full Documentation
The bit-packed prime sieve goes cache-aware. This release adds a segmented sieve variant that processes the range in L1-sized 32 KB windows, complete benchmark harness, and a full documentation suite covering everything from the original C++ origins to production deployment patterns.
Highlights
Primer (seg.rs) — L1-cache-friendly processing in 32 KB segments. Single reused buffer, never leaves cache. 1.4× faster than the flat sieve at n=100M, with 187× less sieve working memory than the primal crate. Zero dependencies, single file, compiles with rustc alone.
rustc -C opt-level=3 -C target-cpu=native seg.rs && ./seg
Benchmark harness (prime_bench.rs) — Head-to-head comparison against primes v0.3 and primal v0.3 across six orders of magnitude (10K–100M). 25 iterations per measurement, cross-verified prime counts, full statistical output (min/median/mean/max/σ).
Bug fixes in the original flat sieve — Two incorrect test assertions that the sieve itself was too correct to trigger:
primes[10_000]is 104,743, not 104,729 (off-by-one in zero-indexed lookup)- 499,999 is composite, not prime. Largest prime ≤ 500,000 is 499,979
Performance (Median, 25 Iterations)
| n | wofl flat | wofl segmented | primal | Seg vs flat |
|---|---|---|---|---|
| 500K | 434 µs | 443 µs | 170 µs | ~1× |
| 10M | 10.3 ms | 10.3 ms | 3.2 ms | ~1× (crossover) |
| 50M | 66.6 ms | 52.5 ms | 30.7 ms | 1.27× |
| 100M | 189.7 ms | 137.0 ms | 61.7 ms | 1.38× |
Building
# Primer (standalone — no Cargo needed) rustc -C opt-level=3 -C target-cpu=native seg.rs -o seg ./segTests
rustc --test seg.rs -o test_seg && ./test_seg
Benchmark harness (requires Cargo)
cargo new all_bench && cd prime_bench
Copy all_bench.rs → src/main.rs, Cargo.toml in place
cargo run --release
What's Next
- Wheel-30 factorisation (~1.8× speedup, would close the primal gap to ~15%)
- Parallel segments via rayon (~N× on N cores, segments are independent)
- Compile-time generation for fixed small prime tables on embedded targets
Full changelog: CHANGELOG.md
The bit-packed prime sieve goes cache-aware. This release adds a segmented sieve variant that processes the range in L1-sized 32 KB windows, complete benchmark harness, and a full documentation suite covering everything from the original C++ origins to production deployment patterns.
Highlights
Primer (seg.rs) — L1-cache-friendly processing in 32 KB segments. Single reused buffer, never leaves cache. 1.4× faster than the flat sieve at n=100M, with 187× less sieve working memory than the primal crate. Zero dependencies, single file, compiles with rustc alone.
rustc -C opt-level=3 -C target-cpu=native seg.rs && ./segBenchmark harness (prime_bench.rs) — Head-to-head comparison against primes v0.3 and primal v0.3 across six orders of magnitude (10K–100M). 25 iterations per measurement, cross-verified prime counts, full statistical output (min/median/mean/max/σ).
Bug fixes in the original flat sieve — Two incorrect test assertions that the sieve itself was too correct to trigger:
primes[10_000]is 104,743, not 104,729 (off-by-one in zero-indexed lookup)- 499,999 is composite, not prime. Largest prime ≤ 500,000 is 499,979
Performance (Median, 25 Iterations)
| n | wofl flat | wofl segmented | primal | Seg vs flat |
|---|---|---|---|---|
| 500K | 434 µs | 443 µs | 170 µs | ~1× |
| 10M | 10.3 ms | 10.3 ms | 3.2 ms | ~1× (crossover) |
| 50M | 66.6 ms | 52.5 ms | 30.7 ms | 1.27× |
| 100M | 189.7 ms | 137.0 ms | 61.7 ms | 1.38× |
Memory (n = 50M)
| wofl segmented | primal | Ratio | |
|---|---|---|---|
| Sieve working memory | 32 KB | 6.0 MB | 187× smaller |
| Result vector | 24.7 MB | 32.0 MB | 1.3× smaller |
| Total | 24.8 MB | 38.0 MB | 1.5× smaller |
Flat Sieve Improvements (v0.1.0 → v0.2.0)
- Pre-allocated result vector via prime-counting upper bound (zero reallocs)
- Overflow-safe integer
isqrt()usingchecked_mul(correct for allu64includingu64::MAX) - Hoisted inner-loop step computation
- Early termination in collection phase
Documentation
This release includes the full doc suite:
| Document | Contents |
|---|---|
PERFORMANCE_ANALYSIS.md |
Benchmarks, scaling behaviour, memory deep dive, target use cases |
SIDE_BY_SIDE.md |
Line-by-line C++ → Rust translation guide |
BORROW_CHECKER_FIX.md |
Why the iterator approach was rejected and how explicit loops fix it |
writeup.md |
Full development narrative — from C++ origins through primer |
CHANGELOG.md |
Version history |
CONTRIBUTING.md |
Build/test workflow, performance contribution standards |
SECURITY.md |
Scope, known limitations, reporting |
Files
| File | Description |
|---|---|
primer.rs |
Primer — standalone, zero deps, rustc compilable |
og.rs |
Flat sieve — optimised from v0.1.0 |
all_bench.rs |
Benchmark harness (requires Cargo + primes/primal crates) |
Cargo.toml |
Manifest for the benchmark harness |
primes.cpp |
The original 5-line C++ that started it all |
Building
# primer (standalone — no Cargo needed)
rustc -C opt-level=3 -C target-cpu=native seg.rs -o seg
./seg
# Tests
rustc --test seg.rs -o test_seg && ./test_seg
# Benchmark harness (requires Cargo)
cargo new og_bench && cd prime_bench
# Copy og_bench.rs → src/main.rs, Cargo.toml in place
cargo run --releaseWhat's Next
- Wheel-30 factorisation (~1.8× speedup, would close the primal gap to ~15%)
- Parallel segments via rayon (~N× on N cores, segments are independent)
- Compile-time generation for fixed small prime tables on embedded targets
Full changelog: [CHANGELOG.md](https://claude.ai/chat/CHANGELOG.md)