Run full Solana blockchain analytics on your laptop — no validator node, no third-party APIs, no cloud costs. SSP streams 100 GB+ snapshots directly from Solana mainnet RPC nodes, parses them in real-time at ~850 MB/s, and outputs to Parquet + DuckDB for instant SQL queries.
SSP discovers Solana RPC nodes, finds the fastest snapshot source, and streams the snapshot directly into a parsing pipeline — no 100GB+ local copy needed.
RPC Discovery → HTTP Stream → zstd decompress → custom tar → AppendVec parse → Token decode → Parquet → DuckDB
ssp --discover # stream full snapshot from fastest RPC node
ssp --discover --incremental # stream incremental snapshot (~1GB)
ssp --download-full # download full snapshot to disk (no parsing)
ssp --download-incremental --output ~/snapshots # download incremental to specific dir
ssp --path snapshot.tar.zst # parse local file
ssp --path snapshot.tar.zst --owner <base58> --pubkey <base58>During processing, a live progress line updates in the terminal showing progress bar, speed, rows parsed, elapsed/ETA, and pipeline health stats (parser blocked / writer starved counts).
| Flag | Description |
|---|---|
--path <file> |
Parse a local .tar.zst snapshot |
--discover |
Find fastest RPC node and stream snapshot |
--incremental |
Use incremental snapshot instead of full |
--download-full |
Download full snapshot to disk without parsing |
--download-incremental |
Download incremental snapshot to disk without parsing |
--output <dir> |
Output directory for downloads (default: .) |
--owner <base58> |
Filter by account owner |
--pubkey <base58> |
Filter by account pubkey |
--hash <base58> |
Filter by account hash |
--include-dead |
Include dead accounts (lamports == 0) |
--include-spam |
Decode all mints/token accounts (bypass Jupiter verified filter) |
--bench |
Run pipeline benchmarks (requires --path) |
Cargo workspace with two crates:
ssp-core— library: parsing, filtering, decoding, record batchesssp-cli— binary (ssp): CLI, pipeline orchestration, RPC discovery, DuckDB, live progress
3-stage multithreaded pipeline connected via bounded crossbeam channels:
Decompressor thread 4 Parser threads Writers
───────────────── ──────────────── ──────────────
zstd → custom tar → bytemuck overlay → 2 account writers
raw buffers + filter 2 decoded writers
+ token decode (parquet files)
│
DuckDB SQL
crates/
├── ssp-core/src/
│ ├── lib.rs # Public API
│ ├── parser.rs # Custom tar parser, AppendVec parsing, stream_raw()
│ ├── filters.rs # Account filters (owner/pubkey/hash, dead filtering)
│ ├── pubkey.rs # Pubkey type (32 bytes, bytemuck Pod, base58)
│ ├── record_batch.rs # Arrow schema, RecordBatch construction
│ └── decoders/
│ ├── mod.rs # Decoder trait, COptionPubkey
│ ├── known_mints.rs # Jupiter verified token list (embedded)
│ └── token_program/
│ ├── mod.rs # Mint/TokenAccount structs, COptionU64
│ ├── mint.rs # MintDecoder (82-byte accounts)
│ └── token_account.rs # TokenAccountDecoder (165-byte accounts)
└── ssp-cli/src/
├── main.rs # CLI args, entry point, live stats printer
├── pipeline.rs # Pipeline orchestration, threading, PipelineStats
├── db.rs # DuckDB views, query execution
├── rpc.rs # RPC node discovery, probing, speed testing (async)
└── bench.rs # Pipeline stage benchmarks
- Custom tar parser — replaced
tarcrate for performance and buffer control (~830 MB/s peak) - bytemuck for zero-copy binary parsing (like Zig's packed struct overlay)
- Buffer pooling — recycling
Vec<u8>between decompressor and parsers - crossbeam-channel bounded channels for backpressure
- Decoder trait — pluggable token decoding (Mint, TokenAccount), writes to separate parquet files
- Async (
tokio) only for RPC discovery (probing 300+ nodes concurrently); everything else uses threads - Parser accepts
impl Read— same code handles both local files and HTTP streams
109 GB full snapshot (snapshot-389758228.tar.zst), 1.05B accounts, M1-series Mac:
| Configuration | Throughput |
|---|---|
| Single-threaded (parse + write) | ~123 MB/s |
| Multi-threaded pipeline + filtered | ~840 MB/s sustained, ~870 MB/s peak |
Spam filter uses Jupiter's verified token list (4550 mints, embedded at compile time). Only verified mints and their token accounts are decoded. Use --include-spam to bypass.
- Streaming parser (zstd → custom tar → AppendVec)
- Multi-threaded pipeline (decompressor + 4 parsers + 2+2 writers)
- Buffer pooling
- Token Program decoding (Mint + TokenAccount)
- DuckDB integration
- CLI filters (owner, pubkey, hash, dead accounts)
- RPC node discovery + network streaming
- Spam filter (Jupiter verified token list,
--include-spamto bypass) - Architecture refactor: extract core as reusable library crate
- CLI with live progress bar and pipeline health stats
- Download-only mode (
--download-full/--download-incremental) - UX improvements
- Configuration file
- DuckDB SQL query interface
- More decoders (System, Stake, Vote, Token-2022)
- Custom decoders
- Upload a custom ABI
- Parse via a dynamic ABI
- Parallel downloads
- Incremental snapshot merging
- Resume on network failure
In slow development :)
Rust 2024, bytemuck, zstd, crossbeam, arrow/parquet, DuckDB, reqwest, tokio, clap