🌶️ sracha 🌶️

Fast SRA downloader and FASTQ converter, written in pure Rust.

Features

Fast -- 5-13x faster than fasterq-dump on typical SRA files
One command -- download, convert to FASTQ, and compress
Batch input -- accessions, BioProjects (PRJNA), studies (SRP), or a file via --accession-list
gzip or zstd output -- parallel compression, or plain FASTQ
FASTA output -- --fasta drops quality scores
SRA and SRA-lite -- full or simplified quality scores
Split modes -- split-3, split-files, split-spot, interleaved
Resumable downloads -- picks up where it left off
Stdout streaming -- -Z pipes FASTQ straight into downstream tools
Integrity checks -- MD5 verification on download and decode
Platform support -- Illumina, BGISEQ/DNBSEQ, Element, Ultima, PacBio, Nanopore (legacy 454 and Ion Torrent are not supported)
Single static binary -- no Python, no C dependencies

Quick start

# Download, convert, and compress
sracha get SRR28588231

# Download all runs from a BioProject
sracha get PRJNA675068

# Batch download from an accession list
sracha get --accession-list SRR_Acc_List.txt

# Just download
sracha fetch SRR28588231

# Convert a local .sra file
sracha fastq SRR28588231.sra

# Show accession info
sracha info SRR28588231

# Validate a downloaded file
sracha validate SRR28588231.sra

Benchmarks

Local decode (SRA file on disk → FASTQ)

Uncompressed output, measured with hyperfine.

File	Size	sracha	fasterq-dump	fastq-dump	Speedup vs fasterq-dump
SRR28588231	23 MiB	0.14 s	1.83 s	1.87 s	13.3x
SRR2584863	288 MiB	1.13 s	5.37 s	11.41 s	4.8x
ERR1018173	1.94 GiB	6.76 s	32.25 s	--	4.8x

Compression adds minimal overhead -- sracha produces gzipped FASTQ by default with parallel block compression, so the integrated pipeline (sracha get) is often faster end-to-end than fasterq-dump followed by a separate gzip step.

End-to-end (accession → FASTQ, including download)

Download + decode, 5 runs each from a fresh temp dir.

Accession	Size	`sracha get`	`prefetch + fasterq-dump`	`prefetch + fastq-dump`	Speedup vs `prefetch + fasterq-dump`
SRR28588231	23 MiB	1.44 s	4.17 s	4.27 s	2.90x
SRR2584863	288 MiB	8.08 s	12.55 s	18.47 s	1.55x

sracha get beats prefetch + fasterq-dump end-to-end even with the network in the loop, because the parallel chunked downloader overlaps with decode and the decode itself is 5x faster. See validation/bench-results/ for raw hyperfine output.

Full hyperfine output

SRR28588231 (23 MiB, 66K spots, Illumina paired)

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`sracha`	137.5 ± 5.9	127.5	148.8	1.00
`fasterq-dump`	1832.8 ± 23.9	1799.1	1857.7	13.33 ± 0.60
`fastq-dump`	1871.7 ± 30.2	1840.8	1910.6	13.62 ± 0.62

SRR2584863 (288 MiB, Illumina paired)

Command	Mean [s]	Min [s]	Max [s]	Relative
`sracha`	1.126 ± 0.091	1.059	1.230	1.00
`fasterq-dump`	5.368 ± 0.024	5.347	5.394	4.77 ± 0.39
`fastq-dump`	11.410 ± 0.025	11.392	11.438	10.13 ± 0.82

ERR1018173 (1.94 GiB, 15.6M spots, Illumina paired, single run)

Command	Time [s]
`sracha`	6.76
`fasterq-dump`	32.25

sracha gzip overhead (SRR28588231)

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`sracha (no compression)`	131.3 ± 4.1	123.6	138.2	1.00
`sracha (gzip)`	189.7 ± 2.8	184.7	194.2	1.44 ± 0.05

End-to-end: SRR28588231 (23 MiB) — accession → FASTQ (5 runs)

Command	Mean [s]	Min [s]	Max [s]	Relative
`sracha get`	1.437 ± 0.067	1.383	1.550	1.00
`prefetch + fasterq-dump`	4.169 ± 0.034	4.125	4.209	2.90 ± 0.14
`prefetch + fastq-dump`	4.270 ± 0.099	4.199	4.422	2.97 ± 0.15

End-to-end: SRR2584863 (288 MiB) — accession → FASTQ (5 runs)

Command	Mean [s]	Min [s]	Max [s]	Relative
`sracha get`	8.078 ± 0.154	7.906	8.277	1.00
`prefetch + fasterq-dump`	12.547 ± 0.381	12.027	12.849	1.55 ± 0.06
`prefetch + fastq-dump`	18.466 ± 0.156	18.369	18.741	2.29 ± 0.05

Benchmarks run with sracha v0.3.0, sra-tools v3.4.1, on Linux (16 CPUs). Install the reference toolkit with pixi run install-sratools and reproduce with validation/benchmark.sh.

Installation

Install via Bioconda:

pixi add --channel bioconda sracha

Or download pre-built binaries from the releases page, or install from source:

cargo install --git https://github.com/rnabioco/sracha-rs sracha

Documentation

Full CLI reference and usage guide: https://rnabioco.github.io/sracha-rs/

Acknowledgments

sracha builds on the Sequence Read Archive, maintained by the National Center for Biotechnology Information at the National Library of Medicine. The SRA and its toolchain are public-domain software developed by U.S. government employees — our tax dollars at work. Special thanks to Kenneth Durbrow (@durbrow) and the SRA Toolkit team for building and maintaining the infrastructure that makes projects like this possible.

This project wouldn't exist without NCBI's open infrastructure: the VDB/KAR format, the SDL locate API, EUtils, and public S3 hosting of sequencing data. sracha aims to make it easier for the community to build on that foundation.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 188 Commits
.claude		.claude
.github/workflows		.github/workflows
crates		crates
docs		docs
validation		validation
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
pixi.toml		pixi.toml
zensical.toml		zensical.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌶️ sracha 🌶️

Features

Quick start

Benchmarks

Local decode (SRA file on disk → FASTQ)

End-to-end (accession → FASTQ, including download)

Installation

Documentation

Acknowledgments

License

About

Uh oh!

Releases 14

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

🌶️ sracha 🌶️

Features

Quick start

Benchmarks

Local decode (SRA file on disk → FASTQ)

End-to-end (accession → FASTQ, including download)

Installation

Documentation

Acknowledgments

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 14

Contributors 1

Languages