This file provides guidance for Claude Code when working with the Mussel codebase.
Mussel is a computational pathology toolkit for processing whole-slide images (WSI). It provides CLI tools for tiling, feature extraction, annotation, and aggregation using various foundation models (OpenCLIP, ResNet-50, TransPath, Virchow, Virchow2, Prov-GigaPath, H-Optimus-0, GooglePath, CONCH).
- Language: Python 3.10-3.11
- Package Manager:
uv - Build System: setuptools (via
pyproject.toml) - License: GPL-3.0
mussel/
cli/ # CLI entry points (tessellate, extract_features, annotate, etc.)
models/ # Foundation model implementations and factory
datasets/ # Data loading (HDF5, tile coords, flat images)
utils/ # Feature extraction, segmentation, file I/O, ML utilities
tests/
testdata/ # Test slide images and fixtures
mussel/ # Mirrors main package structure
presets/ # Preset configurations (biopsy, resection, TCGA)
uv sync --extra torch-gpu # PyTorch with CUDA
uv sync --extra torch-cpu # PyTorch CPU-only
uv sync --extra tensorflow-gpu # TensorFlow with CUDAuv run pytest testsuv run pytest tests/mussel/cli/test_tessellate.pyuv run pytest tests/mussel/cli/test_tessellate.py::test_function_name- Formatter:
black - Import sorting:
isort - Type checking:
mypy - Logging: Standard
loggingmodule (not loguru) - Type hints are used throughout; follow existing patterns
- Models use the factory pattern (
ModelFactory.create()) - Dataset processing uses the strategy pattern (
get_dataset_processor())
All imports belong at the top of the file. Do not place imports inside functions or methods unless one of these specific exceptions applies:
- Optional / guarded dependency — the import is inside a
try/except ImportErrorblock because the package may not be installed (e.g.fsspec,flash_attn,tensorflow,gigapath). - Platform-conditional import — the import only makes sense on certain OSes (e.g.
fcntlon Linux,msvcrton Windows). - Circular-import workaround — moving the import to the top would create a circular dependency.
Everything else — stdlib modules (os, tempfile, warnings, traceback, collections, multiprocessing, functools), third-party packages that are always installed (numpy, torch, omegaconf), and local modules — must be at the top.
All 13 CLI commands use Hydra with structured configs only (no YAML files). The pattern is:
- Define a
@dataclassfor the command's config with typed fields and defaults - Register it with
ConfigStore - Decorate
main()with@hydra.main(version_base=None, config_path=".", config_name="...")
@dataclass
class ExtractFeaturesConfig:
slide_path: Optional[str] = None
batch_size: int = 64
model_type: ModelType = ModelType.CLIP
cs = ConfigStore.instance()
cs.store(name="extract_features_config", node=ExtractFeaturesConfig)
@hydra.main(version_base=None, config_path=".", config_name="extract_features_config")
def main(cfg: ExtractFeaturesConfig):
...Users pass config via Hydra command-line overrides (not --flag style):
extract_features slide_path=slide.svs model_type=VIRCHOW batch_size=128
tessellate slide_path=slide.svs seg_config=biopsy # config group presetNested config groups are used for segmentation presets (seg_config=default|biopsy|resection|tcga), defined as dataclass inheritance (e.g., BiopsySegConfig(SegConfig)).
Common OmegaConf patterns in the codebase:
OmegaConf.to_container(cfg.nested)to unpack nested configs as dicts for**kwargsOmegaConf.structured(cfg)to copy configsOmegaConf.create(cfg)in tests to create configs programmatically
- CLI modules in
mussel/cli/each expose amain()function registered as a console script inpyproject.toml - Model loading goes through
mussel/models/model_factory.pywhich handles all supported foundation models - Feature extraction core logic lives in
mussel/utils/feature_extract.py - Tissue segmentation is in
mussel/utils/segment.py - File I/O with remote/cloud support is in
mussel/utils/file.py(fsspec-based) - Model caching and downloading is handled by
mussel/utils/model_cache.py
- PyTorch and TensorFlow are mutually exclusive install extras (see
[tool.uv]conflicts inpyproject.toml) - Custom packages
transpathandtimm_ctranspathcome from MSK Mind GitHub repos - The
fastattnextra pins specific torch/xformers/flash-attn versions for GigaPath support transformers<4.46is required for model compatibility- Neural tissue segmentation (
seg_model="neural") is built into Mussel and requires onlytorch-gpuortorch-cpu; weights are auto-downloaded fromMahmoodLab/hest-tissue-segon HuggingFace