This document describes the main command-line tools provided by Mussel, with examples.
Mussel provides a set of CLI tools for tiling whole-slide images, working with tiled slides, and generating feature embeddings with pathology foundation models.
tessellate- tiling and foreground detection of whole-slide imagestessellate_extract_features- combined tiling + feature extraction pipeline; supports batch processing from a directoryextract_features- extract features from whole slide images (WSI) using a foundation model.create_class_embeddings- generate tissue-type embeddings for classifying tilesannotate- annotate tiles with tissue-typescache_tiles- save tile information in an efficient form for trainingexport_tiles- export tiles as individual .png files using an HDF5 tile-coordinate manifest.filter_features- filter features using a classifier modelmerge_annotation_features- merge tile features with annotations from a BMP file.linear_probe_benchmark- benchmark a linear probe classifier on features extracted from a slidesave_model- download and save a foundation model locallyconvert- convert whole-slide images to pyramidal TIFF format (single file or batch)
Each of these commands is configurable with a number of different parameters.
You can always get a quick list of the parameters and default values for a given tool
by executing <command> --help.
The example commands below use the test data provided in the tests/testdata folder.
Tessellate tiles a whole-slide image. The tile coordinates and other metadata necessary for downstream steps are written to an HDF5 (.h5) file.
Mussel reads tiles from the slide at the resolution specified by seg_config.mpp
(default 0.5 µm/px, roughly 20×). The slide's native MPP is determined automatically
from the file metadata; see MPP fallback chain below.
Example command (see defaults with tessellate --help):
tessellate \
slide_path=tests/testdata/948176.svs \
output_h5_path=948176_coord.h5 \
seg_config.segment_threshold=0 \
num_workers=1Mussel uses tiffslide (backed by tifffile) to read whole-slide images.
| Format | Extension | Vendor | Tiffslide support |
|---|---|---|---|
| Aperio SVS | .svs |
Leica/Aperio | ✅ Full |
| Leica SCN | .scn |
Leica | ✅ Full |
| Generic / OME TIFF | .tif, .tiff |
Various | ✅ Full |
| Hamamatsu NDPI | .ndpi |
Hamamatsu | |
| Ventana BIF | .bif |
Ventana/Roche | |
| MIRAX | .mrxs |
3DHistech | |
| Hamamatsu VMS/VMU | .vms, .vmu |
Hamamatsu | |
| PerkinElmer QPTIFF | .qptiff |
PerkinElmer | |
| Zeiss CZI | .czi |
Zeiss |
Format limitations:
- NDPI / BIF — tiffslide's vendor parsers are incomplete; MPP is derived from
tiff.XResolution/tiff.ResolutionUnittags (works for most files). Useseg_config.slide_mpp_overrideif MPP is incorrect. - MRXS — multi-file format: the
.mrxsfile and its sidecar directory (same name, no extension) must be in the same location. Moving the.mrxsalone will fail. - QPTIFF — multiplex/multi-channel files are tiled using the first channel only.
- CZI — multi-series files (multiple acquisitions) use series 0 only.
- VMS / VMU — uncommon on modern scanners; validate before production use.
Mussel determines the slide's native microns-per-pixel (MPP) using the following fallback chain. The first value found is used:
seg_config.slide_mpp_override— explicit CLI override; bypasses all metadata readingtiffslide.mpp-x— standard property populated by tiffslide for all supported formatsaperio.MPP/openslide.mpp-x— legacy vendor propertiestiff.XResolution+tiff.ResolutionUnit— raw TIFF resolution tags converted to µm/px (INCH, CENTIMETER, MILLIMETER, MICROMETER supported); tiffslide exposes these for partially-supported formats (NDPI, BIF, MRXS, QPTIFF, CZI) even when it cannot normalize them totiffslide.mpp-x- Magnification estimate — derived from objective-power metadata as
10.0 / magnification - Default 0.5 µm/px — used as last resort with a warning logged
If the slide has missing or corrupt MPP metadata, use the override:
tessellate slide_path=slide.svs seg_config.slide_mpp_override=0.5 ...
tessellate_extract_features slide_path=slide.svs seg_config.slide_mpp_override=0.25 ...
export_tiles slide_path=slide.svs slide_mpp_override=0.5 ...| Parameter | Default | Description |
|---|---|---|
seg_config.mpp |
0.5 |
Target resolution for tile extraction (µm/px). |
seg_config.patch_size |
256 |
Tile size in pixels at the target MPP. |
seg_config.overlap |
0 |
Patch overlap in absolute pixels. Sets step_size = patch_size - overlap. |
seg_config.min_tissue_proportion |
0.0 |
Discard patches where the tissue fraction is below this value (0.0–1.0). |
seg_config.remove_artifacts |
false |
Enable artifact removal (requires artifact_remover_fn hook). |
seg_config.remove_penmarks |
false |
Enable pen-mark removal (requires artifact_remover_fn hook). |
seg_config.seg_model |
"classic" |
Segmentation backend: "classic" (HSV + fixed threshold), "otsu" (HSV + Otsu automatic threshold), or "neural" (deep learning; see below). Note: the old seg_config.use_otsu=true flag is deprecated — use seg_model=otsu instead. |
seg_config.slide_mpp_override |
null |
Override the slide's native MPP; useful when metadata is missing or wrong. |
Example with 50% overlap and tissue filtering:
tessellate \
slide_path=tests/testdata/948176.svs \
output_h5_path=948176_coord.h5 \
seg_config.overlap=128 \
seg_config.min_tissue_proportion=0.5By default Mussel uses a classic HSV/Otsu threshold pipeline (seg_model="classic").
Setting seg_model="neural" switches to a deep-learning segmenter that is more
robust on challenging slides (stain variation, artefacts, pale tissue).
The neural segmenter uses a DeepLabV3-ResNet50 model (2-class: tissue vs background) trained on histopathology slides as part of the HEST project at the Mahmood Lab, Harvard Medical School. The pre-trained checkpoint is hosted on HuggingFace at MahmoodLab/hest-tissue-seg and is downloaded automatically on first use (no account or token required).
Reference: Chan et al., "A Pathology Foundation Model for Cancer Diagnosis and Prognosis Prediction", Nature 2025. [paper] [GitHub] [HuggingFace]
The neural segmenter operates at 1 µm/px resolution (≈10×); images are auto-resampled before inference and the mask is rescaled back to the slide's native resolution. A CUDA GPU is recommended for practical performance but CPU inference is supported.
No extra packages are required — neural segmentation works with any torch-gpu or
torch-cpu install:
uv sync --extra torch-gpu # or torch-cpuTo use it:
tessellate \
slide_path=tests/testdata/948176.svs \
output_h5_path=948176_coord.h5 \
seg_config.seg_model=neural
tessellate_extract_features \
slide_path=tests/testdata/948176.svs \
output_h5_path=948176_feat.h5 \
output_pt_path=948176_embed.pt \
model_type=UNI2 \
seg_config.seg_model=neuralUse a pathology foundation model to calculate feature embeddings for a slide tiled using
the tessellate commaand described above. This generates both an HDF5 (.h5) file and
a PyTorch (.pt) file, with embeddings for each tile.
The following models are currently supported,
Slide encoders (require patch-level features as input):
| Model | model_type | Patch encoder required | Access |
|---|---|---|---|
| Prov-GigaPath | GIGAPATH_SLIDE | GIGAPATH | 🔒 gated |
| TITAN | TITAN_SLIDE | CONCH1_5 | 🔒 gated |
| PRISM | PRISM_SLIDE | VIRCHOW | 🔒 gated |
| FEATHER | FEATHER_SLIDE | CONCH1_5 | 🔒 gated |
| MADELEINE | MADELEINE_SLIDE | CONCH1_5 | 🔒 gated |
| CHIEF | CHIEF_SLIDE | CTRANSPATH | local ckpt |
OpenCLIP is used by default, with the default model being QuiltNet-B-16-PMB. Use the model_type parameter to specify a different model.
To use H-Optimus-0, for example,
extract_features \
slide_path=tests/testdata/948176.svs \
patch_h5_path=tests/testdata/948176.patch.h5 \
model_type=OPTIMUS \
output_h5_path=948176_feat.h5 \
output_pt_path=948176_embed.ptMost models download automatically from HuggingFace. 🔒 Gated models require you to visit the model page, sign the access agreement, and set your HuggingFace token:
export HF_TOKEN=hf_...Gated models — visit the link in the table above to request access:
- Mahmood Lab (MahmoodLab): UNI, UNI2, CONCH_V1, CONCH1_5, TITAN_SLIDE, FEATHER_SLIDE, MADELEINE_SLIDE
- Paige AI (paige-ai): VIRCHOW, VIRCHOW2, PRISM_SLIDE
- Bioptimus (bioptimus): OPTIMUS, H_OPTIMUS_1, H0_MINI
- Prov-GigaPath: GIGAPATH, GIGAPATH_SLIDE
- Google: GOOGLEPATH
- HistAI: HIBOU_L
- SophontAI: OPENMIDNIGHT
- GenBio AI: GENBIO_PATHFM
Public models (no token needed): RESNET50, CLIP, PHIKON, PHIKON_V2, MIDNIGHT12K, GPFM, KAIKO_VITS8, KAIKO_VITS16, KAIKO_VITB8, KAIKO_VITB16, KAIKO_VITL14, LUNIT_VITS8, LUNIT_VITS16
Local-checkpoint-only models: CTRANSPATH and CHIEF_SLIDE require manually downloaded checkpoints (no HuggingFace download). Pass the checkpoint path via model_path=.
Finally, you can generate features from a folder of pre-tiled images, specifying the
folder using patch_path parameter.
extract_features \
slide_path=None \
patch_h5_path=None \
patch_path=<path to folder w/ tiles in image format (.tif, .png, .jpg, etc.)> \
output_h5_path=<path to output h5 file> \
output_pt_path=Nonetessellate_extract_features runs tessellation and feature extraction in a single command.
It also supports batch processing of an entire directory of slides:
# Single slide
tessellate_extract_features \
slide_path=tests/testdata/948176.svs \
output_h5_path=948176_feat.h5 \
output_pt_path=948176_embed.pt \
model_type=OPTIMUS
# All slides in a directory (flat)
tessellate_extract_features \
wsi_dir=/data/slides \
output_h5_path=/data/features/{name}_feat.h5 \
output_pt_path=/data/features/{name}_embed.pt \
model_type=VIRCHOW2
# All slides in a directory tree (recursive)
tessellate_extract_features \
wsi_dir=/data/slides \
search_nested=true \
output_h5_path=/data/features/{name}_feat.h5 \
output_pt_path=/data/features/{name}_embed.pt \
model_type=VIRCHOW2Supported WSI extensions discovered during directory scan: .svs, .ndpi, .tiff, .tif, .scn, .mrxs, .vms, .vmu, .bif, .qptiff, .czi.
All seg_config.* options (including seg_model=neural and slide_mpp_override) are
also available on this command; see the tessellate section above.
You can generate embeddings for different tissue types, using the QuiltNet OpenClip model, and use these to annotate a set of tiles for which you have OpenClip embeddings.
The tests/testdata/ folder includes some embeddings generated for the following tissue
types,
- "carcinoma in situ"
- "invasive carcinoma with lymphocytes"
- "tumor infiltrating lymphocytes"
- "lymphocytes"
- "carcinoma in situ with lymphocytes"
- "tumor-associated stroma with lymphocytes"
You can apply these to the sample slide with the command
annotate \
features_pt_path=tests/testdata/948176.features.pt \
class_embedding_pt_path=tests/testdata/class_embedding.pt \
classes='["carcinoma in situ","invasive carcinoma","collagenous stroma","adipose","vessel","necrosis", "invasive adenocarcinoma","sarcoma"]' \
output_csv_path=948176.annotations.csv You can also define your own classes with OpenClip! Any natural language works, and no training is required. For example,
create_class_embeddings \
classes='["carcinoma in situ","invasive carcinoma with lymphocytes","tumor infiltrating lymphocytes","lymphocytes","carcinoma in situ with lymphocytes","tumor-associated stroma with lymphocytes"]' \
output_pt_path=my_classes.pt
annotate \
features_pt_path=tests/testdata/948176.features.pt \
class_embedding_pt_path=my_classes.pt \
classes='["carcinoma in situ","invasive carcinoma with lymphocytes","tumor infiltrating lymphocytes","lymphocytes","carcinoma in situ with lymphocytes","tumor-associated stroma with lymphocytes"]' \
output_csv_path=948176.annotations-my-classes.csvUse cache_tiles to generate a PyTorch (.pt) file for rapid access to tiles during I/O intense
operations such as training. This can be conditioned on tissue types: e.g. cache only the tiles
containing invasive carcinoma by setting limit_to_class. The patch_h5_path input file is
the output from tessellate.
cache_tiles \
slide_path=tests/testdata/948176.svs \
patch_h5_path=948176_coord.h5 \
annotation_csv_path=tests/testdata/948176.annotation.csv \
'limit_to_class=["carcinoma in situ", "invasive carcinoma with lymphocytes"]' \
output_pt_path=948176_cache.pt \
output_indices_json_path=948176_output_indices.jsonThis takes about ten seconds for an example slide.
You can download and save a foundation model locally with the save_model command.
save_model model_type=OPTIMUS output_path=optimus.pklconvert converts whole-slide images to pyramidal TIFF format. It supports both
single-file and batch (directory) mode.
Single file:
convert \
input_path=slide.ndpi \
output_dir=converted/ \
mpp=0.25Batch mode (directory of slides with an MPP CSV):
convert \
input_path=/data/slides/ \
output_dir=/data/converted/ \
mpp_csv=slides_mpp.csv \
num_workers=8The CSV must have columns wsi (filename with extension) and mpp (microns-per-pixel).
Each input file <stem>.<ext> produces output_dir/<stem>.tiff. Pass
bigtiff=true for files larger than ~4 GB.
| Parameter | Default | Description |
|---|---|---|
input_path |
required | Path to a single slide file or a directory of slides. |
output_dir |
required | Directory for converted TIFF files (created if absent). |
mpp |
— | Microns-per-pixel of the source image. Required for single-file mode. |
mpp_csv |
— | CSV with wsi and mpp columns. Required for batch/directory mode. |
downscale_by |
1 |
Integer downsample factor (e.g. 2 converts a 40× slide to 20×). |
num_workers |
1 |
Parallel workers for batch mode (0 = all CPUs). |
bigtiff |
false |
Write BigTIFF format (required for files > ~4 GB). |

