Fast SRA downloader and FASTQ converter, written in pure Rust.
- Fast -- 5-13x faster than
fasterq-dumpon typical SRA files - One command -- download, convert to FASTQ, and compress
- Batch input -- accessions, BioProjects (PRJNA), studies (SRP), or a file via
--accession-list - gzip or zstd output -- parallel compression, or plain FASTQ
- FASTA output --
--fastadrops quality scores - SRA and SRA-lite -- full or simplified quality scores
- Split modes -- split-3, split-files, split-spot, interleaved
- Resumable downloads -- picks up where it left off
- Stdout streaming --
-Zpipes FASTQ straight into downstream tools - Integrity checks -- MD5 verification on download and decode
- Platform support -- Illumina, BGISEQ/DNBSEQ, Element, Ultima, PacBio, Nanopore (legacy 454 and Ion Torrent are not supported)
- Single static binary -- no Python, no C dependencies
# Download, convert, and compress
sracha get SRR28588231
# Download all runs from a BioProject
sracha get PRJNA675068
# Batch download from an accession list
sracha get --accession-list SRR_Acc_List.txt
# Just download
sracha fetch SRR28588231
# Convert a local .sra file
sracha fastq SRR28588231.sra
# Show accession info
sracha info SRR28588231
# Validate a downloaded file
sracha validate SRR28588231.sraUncompressed output, measured with hyperfine.
| File | Size | sracha | fasterq-dump | fastq-dump | Speedup vs fasterq-dump |
|---|---|---|---|---|---|
| SRR28588231 | 23 MiB | 0.14 s | 1.83 s | 1.87 s | 13.3x |
| SRR2584863 | 288 MiB | 1.13 s | 5.37 s | 11.41 s | 4.8x |
| ERR1018173 | 1.94 GiB | 6.76 s | 32.25 s | -- | 4.8x |
Compression adds minimal overhead -- sracha produces gzipped FASTQ by default
with parallel block compression, so the integrated pipeline
(sracha get) is often faster end-to-end than fasterq-dump followed by a
separate gzip step.
Download + decode, 5 runs each from a fresh temp dir.
| Accession | Size | sracha get |
prefetch + fasterq-dump |
prefetch + fastq-dump |
Speedup vs prefetch + fasterq-dump |
|---|---|---|---|---|---|
| SRR28588231 | 23 MiB | 1.44 s | 4.17 s | 4.27 s | 2.90x |
| SRR2584863 | 288 MiB | 8.08 s | 12.55 s | 18.47 s | 1.55x |
sracha get beats prefetch + fasterq-dump end-to-end even with the
network in the loop, because the parallel chunked downloader overlaps
with decode and the decode itself is 5x faster. See
validation/bench-results/ for raw hyperfine output.
Full hyperfine output
SRR28588231 (23 MiB, 66K spots, Illumina paired)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
sracha |
137.5 ± 5.9 | 127.5 | 148.8 | 1.00 |
fasterq-dump |
1832.8 ± 23.9 | 1799.1 | 1857.7 | 13.33 ± 0.60 |
fastq-dump |
1871.7 ± 30.2 | 1840.8 | 1910.6 | 13.62 ± 0.62 |
SRR2584863 (288 MiB, Illumina paired)
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|---|---|---|---|---|
sracha |
1.126 ± 0.091 | 1.059 | 1.230 | 1.00 |
fasterq-dump |
5.368 ± 0.024 | 5.347 | 5.394 | 4.77 ± 0.39 |
fastq-dump |
11.410 ± 0.025 | 11.392 | 11.438 | 10.13 ± 0.82 |
ERR1018173 (1.94 GiB, 15.6M spots, Illumina paired, single run)
| Command | Time [s] |
|---|---|
sracha |
6.76 |
fasterq-dump |
32.25 |
sracha gzip overhead (SRR28588231)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
sracha (no compression) |
131.3 ± 4.1 | 123.6 | 138.2 | 1.00 |
sracha (gzip) |
189.7 ± 2.8 | 184.7 | 194.2 | 1.44 ± 0.05 |
End-to-end: SRR28588231 (23 MiB) — accession → FASTQ (5 runs)
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|---|---|---|---|---|
sracha get |
1.437 ± 0.067 | 1.383 | 1.550 | 1.00 |
prefetch + fasterq-dump |
4.169 ± 0.034 | 4.125 | 4.209 | 2.90 ± 0.14 |
prefetch + fastq-dump |
4.270 ± 0.099 | 4.199 | 4.422 | 2.97 ± 0.15 |
End-to-end: SRR2584863 (288 MiB) — accession → FASTQ (5 runs)
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|---|---|---|---|---|
sracha get |
8.078 ± 0.154 | 7.906 | 8.277 | 1.00 |
prefetch + fasterq-dump |
12.547 ± 0.381 | 12.027 | 12.849 | 1.55 ± 0.06 |
prefetch + fastq-dump |
18.466 ± 0.156 | 18.369 | 18.741 | 2.29 ± 0.05 |
Benchmarks run with sracha v0.3.0, sra-tools v3.4.1, on Linux
(16 CPUs). Install the reference toolkit with pixi run install-sratools
and reproduce with validation/benchmark.sh.
Install via Bioconda:
pixi add --channel bioconda srachaOr download pre-built binaries from the releases page, or install from source:
cargo install --git https://github.com/rnabioco/sracha-rs srachaFull CLI reference and usage guide: https://rnabioco.github.io/sracha-rs/
sracha builds on the Sequence Read Archive, maintained by the National Center for Biotechnology Information at the National Library of Medicine. The SRA and its toolchain are public-domain software developed by U.S. government employees — our tax dollars at work. Special thanks to Kenneth Durbrow (@durbrow) and the SRA Toolkit team for building and maintaining the infrastructure that makes projects like this possible.
This project wouldn't exist without NCBI's open infrastructure: the VDB/KAR format, the SDL locate API, EUtils, and public S3 hosting of sequencing data. sracha aims to make it easier for the community to build on that foundation.
MIT
