Mussel commands

This document describes the main command-line tools provided by Mussel, with examples.

Commands

Mussel provides a set of CLI tools for tiling whole-slide images, working with tiled slides, and generating feature embeddings with pathology foundation models.

tessellate - tiling and foreground detection of whole-slide images
tessellate_extract_features - combined tiling + feature extraction pipeline; supports batch processing from a directory
extract_features - extract features from whole slide images (WSI) using a foundation model.
create_class_embeddings - generate tissue-type embeddings for classifying tiles
annotate - annotate tiles with tissue-types
cache_tiles - save tile information in an efficient form for training
export_tiles - export tiles as individual .png files using an HDF5 tile-coordinate manifest.
filter_features - filter features using a classifier model
merge_annotation_features - merge tile features with annotations from a BMP file.
linear_probe_benchmark - benchmark a linear probe classifier on features extracted from a slide
save_model - download and save a foundation model locally
convert - convert whole-slide images to pyramidal TIFF format (single file or batch)

Each of these commands is configurable with a number of different parameters. You can always get a quick list of the parameters and default values for a given tool by executing <command> --help.

Examples

The example commands below use the test data provided in the tests/testdata folder.

`tessellate`

Tessellate tiles a whole-slide image. The tile coordinates and other metadata necessary for downstream steps are written to an HDF5 (.h5) file.

Mussel reads tiles from the slide at the resolution specified by seg_config.mpp (default 0.5 µm/px, roughly 20×). The slide's native MPP is determined automatically from the file metadata; see MPP fallback chain below.

Example command (see defaults with tessellate --help):

tessellate \
    slide_path=tests/testdata/948176.svs \
    output_h5_path=948176_coord.h5 \
    seg_config.segment_threshold=0 \
    num_workers=1

Supported slide formats

Mussel uses tiffslide (backed by tifffile) to read whole-slide images.

Format	Extension	Vendor	Tiffslide support
Aperio SVS	`.svs`	Leica/Aperio	✅ Full
Leica SCN	`.scn`	Leica	✅ Full
Generic / OME TIFF	`.tif`, `.tiff`	Various	✅ Full
Hamamatsu NDPI	`.ndpi`	Hamamatsu	⚠️ Partial — MPP from TIFF tags
Ventana BIF	`.bif`	Ventana/Roche	⚠️ Partial — MPP from TIFF tags
MIRAX	`.mrxs`	3DHistech	⚠️ Generic TIFF; requires sidecar dir
Hamamatsu VMS/VMU	`.vms`, `.vmu`	Hamamatsu	⚠️ Generic TIFF
PerkinElmer QPTIFF	`.qptiff`	PerkinElmer	⚠️ Generic TIFF; first channel only
Zeiss CZI	`.czi`	Zeiss	⚠️ Generic TIFF; first series only

Format limitations:

NDPI / BIF — tiffslide's vendor parsers are incomplete; MPP is derived from tiff.XResolution / tiff.ResolutionUnit tags (works for most files). Use seg_config.slide_mpp_override if MPP is incorrect.
MRXS — multi-file format: the .mrxs file and its sidecar directory (same name, no extension) must be in the same location. Moving the .mrxs alone will fail.
QPTIFF — multiplex/multi-channel files are tiled using the first channel only.
CZI — multi-series files (multiple acquisitions) use series 0 only.
VMS / VMU — uncommon on modern scanners; validate before production use.

MPP resolution

Mussel determines the slide's native microns-per-pixel (MPP) using the following fallback chain. The first value found is used:

seg_config.slide_mpp_override — explicit CLI override; bypasses all metadata reading
tiffslide.mpp-x — standard property populated by tiffslide for all supported formats
aperio.MPP / openslide.mpp-x — legacy vendor properties
tiff.XResolution + tiff.ResolutionUnit — raw TIFF resolution tags converted to µm/px (INCH, CENTIMETER, MILLIMETER, MICROMETER supported); tiffslide exposes these for partially-supported formats (NDPI, BIF, MRXS, QPTIFF, CZI) even when it cannot normalize them to tiffslide.mpp-x
Magnification estimate — derived from objective-power metadata as 10.0 / magnification
Default 0.5 µm/px — used as last resort with a warning logged

If the slide has missing or corrupt MPP metadata, use the override:

tessellate slide_path=slide.svs seg_config.slide_mpp_override=0.5 ...
tessellate_extract_features slide_path=slide.svs seg_config.slide_mpp_override=0.25 ...
export_tiles slide_path=slide.svs slide_mpp_override=0.5 ...

Segmentation and patching options

Parameter	Default	Description
`seg_config.mpp`	`0.5`	Target resolution for tile extraction (µm/px).
`seg_config.patch_size`	`256`	Tile size in pixels at the target MPP.
`seg_config.overlap`	`0`	Patch overlap in absolute pixels. Sets `step_size = patch_size - overlap`.
`seg_config.min_tissue_proportion`	`0.0`	Discard patches where the tissue fraction is below this value (0.0–1.0).
`seg_config.remove_artifacts`	`false`	Enable artifact removal (requires `artifact_remover_fn` hook).
`seg_config.remove_penmarks`	`false`	Enable pen-mark removal (requires `artifact_remover_fn` hook).
`seg_config.seg_model`	`"classic"`	Segmentation backend: `"classic"` (HSV + fixed threshold), `"otsu"` (HSV + Otsu automatic threshold), or `"neural"` (deep learning; see below). Note: the old `seg_config.use_otsu=true` flag is deprecated — use `seg_model=otsu` instead.
`seg_config.slide_mpp_override`	`null`	Override the slide's native MPP; useful when metadata is missing or wrong.

Example with 50% overlap and tissue filtering:

tessellate \
    slide_path=tests/testdata/948176.svs \
    output_h5_path=948176_coord.h5 \
    seg_config.overlap=128 \
    seg_config.min_tissue_proportion=0.5

Neural tissue segmentation (`seg_model="neural"`)

By default Mussel uses a classic HSV/Otsu threshold pipeline (seg_model="classic"). Setting seg_model="neural" switches to a deep-learning segmenter that is more robust on challenging slides (stain variation, artefacts, pale tissue).

The neural segmenter uses a DeepLabV3-ResNet50 model (2-class: tissue vs background) trained on histopathology slides as part of the HEST project at the Mahmood Lab, Harvard Medical School. The pre-trained checkpoint is hosted on HuggingFace at MahmoodLab/hest-tissue-seg and is downloaded automatically on first use (no account or token required).

Reference: Chan et al., "A Pathology Foundation Model for Cancer Diagnosis and Prognosis Prediction", Nature 2025. [paper] [GitHub] [HuggingFace]

The neural segmenter operates at 1 µm/px resolution (≈10×); images are auto-resampled before inference and the mask is rescaled back to the slide's native resolution. A CUDA GPU is recommended for practical performance but CPU inference is supported.

No extra packages are required — neural segmentation works with any torch-gpu or torch-cpu install:

uv sync --extra torch-gpu   # or torch-cpu

To use it:

tessellate \
    slide_path=tests/testdata/948176.svs \
    output_h5_path=948176_coord.h5 \
    seg_config.seg_model=neural

tessellate_extract_features \
    slide_path=tests/testdata/948176.svs \
    output_h5_path=948176_feat.h5 \
    output_pt_path=948176_embed.pt \
    model_type=UNI2 \
    seg_config.seg_model=neural

`extract_features`

Use a pathology foundation model to calculate feature embeddings for a slide tiled using the tessellate commaand described above. This generates both an HDF5 (.h5) file and a PyTorch (.pt) file, with embeddings for each tile.

The following models are currently supported,

Model	model_type	Access	Reference
ResNet-50	RESNET50	public	https://huggingface.co/microsoft/resnet-50
TransPath	CTRANSPATH	local ckpt	https://github.com/Xiyue-Wang/TransPath
Prov-GigaPath	GIGAPATH	🔒 gated	https://huggingface.co/prov-gigapath/prov-gigapath
Virchow	VIRCHOW	🔒 gated	https://huggingface.co/paige-ai/Virchow
Virchow2	VIRCHOW2	🔒 gated	https://huggingface.co/paige-ai/Virchow2
H-Optimus-0	OPTIMUS	🔒 gated	https://huggingface.co/bioptimus/H-optimus-0
H-Optimus-1	H_OPTIMUS_1	🔒 gated	https://huggingface.co/bioptimus/H-optimus-1
H0-mini	H0_MINI	🔒 gated	https://huggingface.co/bioptimus/H0-mini
Phikon	PHIKON	public	https://huggingface.co/owkin/phikon
Phikon-v2	PHIKON_V2	public	https://huggingface.co/owkin/phikon-v2
Midnight-12k	MIDNIGHT12K	public	https://huggingface.co/kaiko-ai/midnight
GPFM	GPFM	public	https://huggingface.co/majiabo/GPFM
Hibou-L	HIBOU_L	🔒 gated	https://huggingface.co/histai/hibou-L
UNI	UNI	🔒 gated	https://huggingface.co/MahmoodLab/UNI
UNI2	UNI2	🔒 gated	https://huggingface.co/MahmoodLab/UNI2-h
OpenCLIP	CLIP	public	https://github.com/mlfoundations/open_clip
GooglePath	GOOGLEPATH	🔒 gated	https://huggingface.co/google/path-foundation
Conch v1.5	CONCH1_5	🔒 gated	https://huggingface.co/MahmoodLab/TITAN
CONCH v1.0	CONCH_V1	🔒 gated	https://huggingface.co/MahmoodLab/CONCH
Kaiko ViT-S/8	KAIKO_VITS8	public	https://huggingface.co/1aurent/vit_small_patch8_224.kaiko_ai_towards_large_pathology_fms
Kaiko ViT-S/16	KAIKO_VITS16	public	https://huggingface.co/1aurent/vit_small_patch16_224.kaiko_ai_towards_large_pathology_fms
Kaiko ViT-B/8	KAIKO_VITB8	public	https://huggingface.co/1aurent/vit_base_patch8_224.kaiko_ai_towards_large_pathology_fms
Kaiko ViT-B/16	KAIKO_VITB16	public	https://huggingface.co/1aurent/vit_base_patch16_224.kaiko_ai_towards_large_pathology_fms
Kaiko ViT-L/14	KAIKO_VITL14	public	https://huggingface.co/1aurent/vit_large_patch14_reg4_224.kaiko_ai_towards_large_pathology_fms
Lunit ViT-S/8	LUNIT_VITS8	public	https://huggingface.co/1aurent/vit_small_patch8_224.lunit_dino
Lunit ViT-S/16	LUNIT_VITS16	public	https://huggingface.co/1aurent/vit_small_patch16_224.lunit_dino
OpenMidnight	OPENMIDNIGHT	🔒 gated	https://huggingface.co/SophontAI/OpenMidnight
GenBio-PathFM	GENBIO_PATHFM	🔒 gated	https://huggingface.co/genbio-ai/genbio-pathfm

Slide encoders (require patch-level features as input):

Model	model_type	Patch encoder required	Access
Prov-GigaPath	GIGAPATH_SLIDE	GIGAPATH	🔒 gated
TITAN	TITAN_SLIDE	CONCH1_5	🔒 gated
PRISM	PRISM_SLIDE	VIRCHOW	🔒 gated
FEATHER	FEATHER_SLIDE	CONCH1_5	🔒 gated
MADELEINE	MADELEINE_SLIDE	CONCH1_5	🔒 gated
CHIEF	CHIEF_SLIDE	CTRANSPATH	local ckpt

OpenCLIP is used by default, with the default model being QuiltNet-B-16-PMB. Use the model_type parameter to specify a different model. To use H-Optimus-0, for example,

extract_features \
    slide_path=tests/testdata/948176.svs \
    patch_h5_path=tests/testdata/948176.patch.h5 \
    model_type=OPTIMUS \
    output_h5_path=948176_feat.h5 \
    output_pt_path=948176_embed.pt

Most models download automatically from HuggingFace. 🔒 Gated models require you to visit the model page, sign the access agreement, and set your HuggingFace token:

export HF_TOKEN=hf_...

Gated models — visit the link in the table above to request access:

Mahmood Lab (MahmoodLab): UNI, UNI2, CONCH_V1, CONCH1_5, TITAN_SLIDE, FEATHER_SLIDE, MADELEINE_SLIDE
Paige AI (paige-ai): VIRCHOW, VIRCHOW2, PRISM_SLIDE
Bioptimus (bioptimus): OPTIMUS, H_OPTIMUS_1, H0_MINI
Prov-GigaPath: GIGAPATH, GIGAPATH_SLIDE
Google: GOOGLEPATH
HistAI: HIBOU_L
SophontAI: OPENMIDNIGHT
GenBio AI: GENBIO_PATHFM

Public models (no token needed): RESNET50, CLIP, PHIKON, PHIKON_V2, MIDNIGHT12K, GPFM, KAIKO_VITS8, KAIKO_VITS16, KAIKO_VITB8, KAIKO_VITB16, KAIKO_VITL14, LUNIT_VITS8, LUNIT_VITS16

Local-checkpoint-only models: CTRANSPATH and CHIEF_SLIDE require manually downloaded checkpoints (no HuggingFace download). Pass the checkpoint path via model_path=.

Finally, you can generate features from a folder of pre-tiled images, specifying the folder using patch_path parameter.

extract_features \
    slide_path=None \
    patch_h5_path=None \
    patch_path=<path to folder w/ tiles in image format (.tif, .png, .jpg, etc.)> \
    output_h5_path=<path to output h5 file> \
    output_pt_path=None

`tessellate_extract_features`

tessellate_extract_features runs tessellation and feature extraction in a single command. It also supports batch processing of an entire directory of slides:

# Single slide
tessellate_extract_features \
    slide_path=tests/testdata/948176.svs \
    output_h5_path=948176_feat.h5 \
    output_pt_path=948176_embed.pt \
    model_type=OPTIMUS

# All slides in a directory (flat)
tessellate_extract_features \
    wsi_dir=/data/slides \
    output_h5_path=/data/features/{name}_feat.h5 \
    output_pt_path=/data/features/{name}_embed.pt \
    model_type=VIRCHOW2

# All slides in a directory tree (recursive)
tessellate_extract_features \
    wsi_dir=/data/slides \
    search_nested=true \
    output_h5_path=/data/features/{name}_feat.h5 \
    output_pt_path=/data/features/{name}_embed.pt \
    model_type=VIRCHOW2

Supported WSI extensions discovered during directory scan: .svs, .ndpi, .tiff, .tif, .scn, .mrxs, .vms, .vmu, .bif, .qptiff, .czi. All seg_config.* options (including seg_model=neural and slide_mpp_override) are also available on this command; see the tessellate section above.

`annotate`

You can generate embeddings for different tissue types, using the QuiltNet OpenClip model, and use these to annotate a set of tiles for which you have OpenClip embeddings.

The tests/testdata/ folder includes some embeddings generated for the following tissue types,

"carcinoma in situ"
"invasive carcinoma with lymphocytes"
"tumor infiltrating lymphocytes"
"lymphocytes"
"carcinoma in situ with lymphocytes"
"tumor-associated stroma with lymphocytes"

You can apply these to the sample slide with the command

annotate \
    features_pt_path=tests/testdata/948176.features.pt \
    class_embedding_pt_path=tests/testdata/class_embedding.pt \
    classes='["carcinoma in situ","invasive carcinoma","collagenous stroma","adipose","vessel","necrosis", "invasive adenocarcinoma","sarcoma"]' \
    output_csv_path=948176.annotations.csv

`create_class_embeddings`

You can also define your own classes with OpenClip! Any natural language works, and no training is required. For example,

create_class_embeddings \
    classes='["carcinoma in situ","invasive carcinoma with lymphocytes","tumor infiltrating lymphocytes","lymphocytes","carcinoma in situ with lymphocytes","tumor-associated stroma with lymphocytes"]' \
    output_pt_path=my_classes.pt

annotate \
    features_pt_path=tests/testdata/948176.features.pt \
    class_embedding_pt_path=my_classes.pt \
    classes='["carcinoma in situ","invasive carcinoma with lymphocytes","tumor infiltrating lymphocytes","lymphocytes","carcinoma in situ with lymphocytes","tumor-associated stroma with lymphocytes"]' \
    output_csv_path=948176.annotations-my-classes.csv

`cache_tiles`

Use cache_tiles to generate a PyTorch (.pt) file for rapid access to tiles during I/O intense operations such as training. This can be conditioned on tissue types: e.g. cache only the tiles containing invasive carcinoma by setting limit_to_class. The patch_h5_path input file is the output from tessellate.

cache_tiles \
    slide_path=tests/testdata/948176.svs \
    patch_h5_path=948176_coord.h5 \
    annotation_csv_path=tests/testdata/948176.annotation.csv \
    'limit_to_class=["carcinoma in situ", "invasive carcinoma with lymphocytes"]' \
    output_pt_path=948176_cache.pt \
    output_indices_json_path=948176_output_indices.json

This takes about ten seconds for an example slide.

`save_model`

You can download and save a foundation model locally with the save_model command.

save_model model_type=OPTIMUS output_path=optimus.pkl

`convert`

convert converts whole-slide images to pyramidal TIFF format. It supports both single-file and batch (directory) mode.

Single file:

convert \
    input_path=slide.ndpi \
    output_dir=converted/ \
    mpp=0.25

Batch mode (directory of slides with an MPP CSV):

convert \
    input_path=/data/slides/ \
    output_dir=/data/converted/ \
    mpp_csv=slides_mpp.csv \
    num_workers=8

The CSV must have columns wsi (filename with extension) and mpp (microns-per-pixel). Each input file <stem>.<ext> produces output_dir/<stem>.tiff. Pass bigtiff=true for files larger than ~4 GB.

Parameter	Default	Description
`input_path`	required	Path to a single slide file or a directory of slides.
`output_dir`	required	Directory for converted TIFF files (created if absent).
`mpp`	—	Microns-per-pixel of the source image. Required for single-file mode.
`mpp_csv`	—	CSV with `wsi` and `mpp` columns. Required for batch/directory mode.
`downscale_by`	`1`	Integer downsample factor (e.g. `2` converts a 40× slide to 20×).
`num_workers`	`1`	Parallel workers for batch mode (`0` = all CPUs).
`bigtiff`	`false`	Write BigTIFF format (required for files > ~4 GB).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mussel commands

Commands

Examples

`tessellate`

Supported slide formats

MPP resolution

Segmentation and patching options

Neural tissue segmentation (`seg_model="neural"`)

`extract_features`

`tessellate_extract_features`

`annotate`

`create_class_embeddings`

`cache_tiles`

`save_model`

`convert`

FilesExpand file tree

README-commands.md

Latest commit

History

README-commands.md

File metadata and controls

Mussel commands

Commands

Examples

tessellate

Supported slide formats

MPP resolution

Segmentation and patching options

Neural tissue segmentation (seg_model="neural")

extract_features

tessellate_extract_features

annotate

create_class_embeddings

cache_tiles

save_model

convert

`tessellate`

Neural tissue segmentation (`seg_model="neural"`)

`extract_features`

`tessellate_extract_features`

`annotate`

`create_class_embeddings`

`cache_tiles`

`save_model`

`convert`