CLAUDE.md

This file provides guidance for Claude Code when working with the Mussel codebase.

Project Overview

Mussel is a computational pathology toolkit for processing whole-slide images (WSI). It provides CLI tools for tiling, feature extraction, annotation, and aggregation using various foundation models (OpenCLIP, ResNet-50, TransPath, Virchow, Virchow2, Prov-GigaPath, H-Optimus-0, GooglePath, CONCH).

Language: Python 3.10-3.11
Package Manager: uv
Build System: setuptools (via pyproject.toml)
License: GPL-3.0

Repository Structure

mussel/
  cli/          # CLI entry points (tessellate, extract_features, annotate, etc.)
  models/       # Foundation model implementations and factory
  datasets/     # Data loading (HDF5, tile coords, flat images)
  utils/        # Feature extraction, segmentation, file I/O, ML utilities
tests/
  testdata/     # Test slide images and fixtures
  mussel/       # Mirrors main package structure
presets/        # Preset configurations (biopsy, resection, TCGA)

Common Commands

Install dependencies

uv sync --extra torch-gpu        # PyTorch with CUDA
uv sync --extra torch-cpu        # PyTorch CPU-only
uv sync --extra tensorflow-gpu   # TensorFlow with CUDA

Run tests

uv run pytest tests

Run a specific test file

uv run pytest tests/mussel/cli/test_tessellate.py

Run a specific test

uv run pytest tests/mussel/cli/test_tessellate.py::test_function_name

Code Style & Conventions

Formatter: black
Import sorting: isort
Type checking: mypy
Logging: Standard logging module (not loguru)
Type hints are used throughout; follow existing patterns
Models use the factory pattern (ModelFactory.create())
Dataset processing uses the strategy pattern (get_dataset_processor())

Import style

All imports belong at the top of the file. Do not place imports inside functions or methods unless one of these specific exceptions applies:

Optional / guarded dependency — the import is inside a try/except ImportError block because the package may not be installed (e.g. fsspec, flash_attn, tensorflow, gigapath).
Platform-conditional import — the import only makes sense on certain OSes (e.g. fcntl on Linux, msvcrt on Windows).
Circular-import workaround — moving the import to the top would create a circular dependency.

Everything else — stdlib modules (os, tempfile, warnings, traceback, collections, multiprocessing, functools), third-party packages that are always installed (numpy, torch, omegaconf), and local modules — must be at the top.

Hydra Configuration System

All 13 CLI commands use Hydra with structured configs only (no YAML files). The pattern is:

Define a @dataclass for the command's config with typed fields and defaults
Register it with ConfigStore
Decorate main() with @hydra.main(version_base=None, config_path=".", config_name="...")

@dataclass
class ExtractFeaturesConfig:
    slide_path: Optional[str] = None
    batch_size: int = 64
    model_type: ModelType = ModelType.CLIP

cs = ConfigStore.instance()
cs.store(name="extract_features_config", node=ExtractFeaturesConfig)

@hydra.main(version_base=None, config_path=".", config_name="extract_features_config")
def main(cfg: ExtractFeaturesConfig):
    ...

Users pass config via Hydra command-line overrides (not --flag style):

extract_features slide_path=slide.svs model_type=VIRCHOW batch_size=128
tessellate slide_path=slide.svs seg_config=biopsy   # config group preset

Nested config groups are used for segmentation presets (seg_config=default|biopsy|resection|tcga), defined as dataclass inheritance (e.g., BiopsySegConfig(SegConfig)).

Common OmegaConf patterns in the codebase:

OmegaConf.to_container(cfg.nested) to unpack nested configs as dicts for **kwargs
OmegaConf.structured(cfg) to copy configs
OmegaConf.create(cfg) in tests to create configs programmatically

Key Architecture Patterns

CLI modules in mussel/cli/ each expose a main() function registered as a console script in pyproject.toml
Model loading goes through mussel/models/model_factory.py which handles all supported foundation models
Feature extraction core logic lives in mussel/utils/feature_extract.py
Tissue segmentation is in mussel/utils/segment.py
File I/O with remote/cloud support is in mussel/utils/file.py (fsspec-based)
Model caching and downloading is handled by mussel/utils/model_cache.py

Dependencies Notes

PyTorch and TensorFlow are mutually exclusive install extras (see [tool.uv] conflicts in pyproject.toml)
Custom packages transpath and timm_ctranspath come from MSK Mind GitHub repos
The fastattn extra pins specific torch/xformers/flash-attn versions for GigaPath support
transformers<4.46 is required for model compatibility
Neural tissue segmentation (seg_model="neural") is built into Mussel and requires only torch-gpu or torch-cpu; weights are auto-downloaded from MahmoodLab/hest-tissue-seg on HuggingFace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project Overview

Repository Structure

Common Commands

Install dependencies

Run tests

Run a specific test file

Run a specific test

Code Style & Conventions

Import style

Hydra Configuration System

Key Architecture Patterns

Dependencies Notes

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Repository Structure

Common Commands

Install dependencies

Run tests

Run a specific test file

Run a specific test

Code Style & Conventions

Import style

Hydra Configuration System

Key Architecture Patterns

Dependencies Notes