Skip to content

Commit 9beea07

Browse files
authored
Merge pull request #106 from pathology-data-mining/feature/cli-improvements
feat: add new CLI commands and refactor existing ones
2 parents a1bf072 + 8655a54 commit 9beea07

40 files changed

Lines changed: 8508 additions & 1562 deletions

CLAUDE.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance for Claude Code when working with the Mussel codebase.
4+
5+
## Project Overview
6+
7+
Mussel is a computational pathology toolkit for processing whole-slide images (WSI). It provides CLI tools for tiling, feature extraction, annotation, and aggregation using various foundation models (OpenCLIP, ResNet-50, TransPath, Virchow, Virchow2, Prov-GigaPath, H-Optimus-0, GooglePath, CONCH).
8+
9+
- **Language**: Python 3.10-3.11
10+
- **Package Manager**: `uv`
11+
- **Build System**: setuptools (via `pyproject.toml`)
12+
- **License**: GPL-3.0
13+
14+
## Repository Structure
15+
16+
```
17+
mussel/
18+
cli/ # CLI entry points (tessellate, extract_features, annotate, etc.)
19+
models/ # Foundation model implementations and factory
20+
datasets/ # Data loading (HDF5, tile coords, flat images)
21+
utils/ # Feature extraction, segmentation, file I/O, ML utilities
22+
tests/
23+
testdata/ # Test slide images and fixtures
24+
mussel/ # Mirrors main package structure
25+
presets/ # Preset configurations (biopsy, resection, TCGA)
26+
```
27+
28+
## Common Commands
29+
30+
### Install dependencies
31+
```bash
32+
uv sync --extra torch-gpu # PyTorch with CUDA
33+
uv sync --extra torch-cpu # PyTorch CPU-only
34+
uv sync --extra tensorflow-gpu # TensorFlow with CUDA
35+
```
36+
37+
### Run tests
38+
```bash
39+
uv run pytest tests
40+
```
41+
42+
### Run a specific test file
43+
```bash
44+
uv run pytest tests/mussel/cli/test_tessellate.py
45+
```
46+
47+
### Run a specific test
48+
```bash
49+
uv run pytest tests/mussel/cli/test_tessellate.py::test_function_name
50+
```
51+
52+
## Code Style & Conventions
53+
54+
- **Formatter**: `black`
55+
- **Import sorting**: `isort`
56+
- **Type checking**: `mypy`
57+
- **Logging**: Standard `logging` module (not loguru)
58+
- Type hints are used throughout; follow existing patterns
59+
- Models use the factory pattern (`ModelFactory.create()`)
60+
- Dataset processing uses the strategy pattern (`get_dataset_processor()`)
61+
62+
### Import style
63+
64+
All imports belong at the top of the file. **Do not place imports inside functions or methods** unless one of these specific exceptions applies:
65+
66+
1. **Optional / guarded dependency** — the import is inside a `try/except ImportError` block because the package may not be installed (e.g. `fsspec`, `flash_attn`, `tensorflow`, `gigapath`).
67+
2. **Platform-conditional import** — the import only makes sense on certain OSes (e.g. `fcntl` on Linux, `msvcrt` on Windows).
68+
3. **Circular-import workaround** — moving the import to the top would create a circular dependency.
69+
70+
Everything else — stdlib modules (`os`, `tempfile`, `warnings`, `traceback`, `collections`, `multiprocessing`, `functools`), third-party packages that are always installed (`numpy`, `torch`, `omegaconf`), and local modules — must be at the top.
71+
72+
## Hydra Configuration System
73+
74+
All 13 CLI commands use Hydra with **structured configs only** (no YAML files). The pattern is:
75+
76+
1. Define a `@dataclass` for the command's config with typed fields and defaults
77+
2. Register it with `ConfigStore`
78+
3. Decorate `main()` with `@hydra.main(version_base=None, config_path=".", config_name="...")`
79+
80+
```python
81+
@dataclass
82+
class ExtractFeaturesConfig:
83+
slide_path: Optional[str] = None
84+
batch_size: int = 64
85+
model_type: ModelType = ModelType.CLIP
86+
87+
cs = ConfigStore.instance()
88+
cs.store(name="extract_features_config", node=ExtractFeaturesConfig)
89+
90+
@hydra.main(version_base=None, config_path=".", config_name="extract_features_config")
91+
def main(cfg: ExtractFeaturesConfig):
92+
...
93+
```
94+
95+
**Users pass config via Hydra command-line overrides** (not `--flag` style):
96+
```bash
97+
extract_features slide_path=slide.svs model_type=VIRCHOW batch_size=128
98+
tessellate slide_path=slide.svs seg_config=biopsy # config group preset
99+
```
100+
101+
**Nested config groups** are used for segmentation presets (`seg_config=default|biopsy|resection|tcga`), defined as dataclass inheritance (e.g., `BiopsySegConfig(SegConfig)`).
102+
103+
**Common OmegaConf patterns in the codebase**:
104+
- `OmegaConf.to_container(cfg.nested)` to unpack nested configs as dicts for `**kwargs`
105+
- `OmegaConf.structured(cfg)` to copy configs
106+
- `OmegaConf.create(cfg)` in tests to create configs programmatically
107+
108+
## Key Architecture Patterns
109+
110+
- **CLI modules** in `mussel/cli/` each expose a `main()` function registered as a console script in `pyproject.toml`
111+
- **Model loading** goes through `mussel/models/model_factory.py` which handles all supported foundation models
112+
- **Feature extraction** core logic lives in `mussel/utils/feature_extract.py`
113+
- **Tissue segmentation** is in `mussel/utils/segment.py`
114+
- **File I/O** with remote/cloud support is in `mussel/utils/file.py` (fsspec-based)
115+
- **Model caching** and downloading is handled by `mussel/utils/model_cache.py`
116+
117+
## Dependencies Notes
118+
119+
- PyTorch and TensorFlow are mutually exclusive install extras (see `[tool.uv]` conflicts in `pyproject.toml`)
120+
- Custom packages `transpath` and `timm_ctranspath` come from MSK Mind GitHub repos
121+
- The `fastattn` extra pins specific torch/xformers/flash-attn versions for GigaPath support
122+
- `numpy<2` is required for compatibility
123+
- `transformers<4.46` is required for model compatibility

Dockerfile

Lines changed: 96 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,115 @@
1-
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04
1+
# Stage 1: Builder
2+
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04 AS builder
3+
4+
# Prevent interactive prompts
5+
ENV DEBIAN_FRONTEND=noninteractive
6+
ENV TZ=America/New_York
7+
8+
# Install Python 3.11
9+
RUN apt-get update && apt-get install -y \
10+
software-properties-common \
11+
&& add-apt-repository ppa:deadsnakes/ppa \
12+
&& apt-get update && apt-get install -y \
13+
python3.11 \
14+
python3.11-dev \
15+
python3.11-distutils \
16+
curl \
17+
&& rm -rf /var/lib/apt/lists/*
18+
19+
# Set Python 3.11 as default
20+
RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1 \
21+
&& update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1
22+
23+
# Install uv
24+
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
225

326
ARG BACKEND=torch-gpu
427
ENV BACKEND=$BACKEND
28+
ENV UV_SYSTEM_PYTHON=1
529

6-
ENV DEBIAN_FRONTEND=noninteractive
7-
8-
RUN apt-get update && apt-get install \
30+
# Install build dependencies
31+
RUN apt-get update && apt-get install -y \
932
build-essential \
1033
libgdal-dev \
1134
liblapack-dev \
1235
libblas-dev \
1336
gfortran \
37+
git \
38+
&& rm -rf /var/lib/apt/lists/*
39+
40+
WORKDIR /app
41+
42+
# Install Python dependencies
43+
RUN --mount=type=cache,target=/root/.cache/uv \
44+
--mount=type=bind,source=uv.lock,target=uv.lock \
45+
--mount=type=bind,source=pyproject.toml,target=pyproject.toml \
46+
uv sync --frozen --no-install-project --extra $BACKEND --extra distributed
47+
48+
# Copy and install the project
49+
COPY . /app
50+
RUN --mount=type=cache,target=/root/.cache/uv \
51+
uv pip install --system . --no-deps --force-reinstall
52+
53+
# Stage 2: Runtime
54+
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04
55+
56+
ENV DEBIAN_FRONTEND=noninteractive
57+
58+
# Install Python 3.11
59+
RUN apt-get update && apt-get install -y \
60+
software-properties-common \
61+
&& add-apt-repository ppa:deadsnakes/ppa \
62+
&& apt-get update && apt-get install -y \
63+
python3.11 \
64+
python3.11-distutils \
65+
curl \
66+
libgdal30 \
67+
liblapack3 \
68+
libblas3 \
69+
libgfortran5 \
1470
libgl1 \
15-
libgl1-mesa-dev \
1671
ffmpeg \
1772
libsm6 \
1873
libxext6 \
19-
curl \
20-
zip \
21-
git -y
74+
sudo \
75+
unzip \
76+
rsync \
77+
&& rm -rf /var/lib/apt/lists/*
78+
79+
# Set Python 3.11 as default
80+
RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1 \
81+
&& update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1
82+
83+
# Install AWS CLI (slim version)
84+
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && \
85+
unzip awscliv2.zip && \
86+
./aws/install && \
87+
rm -rf awscliv2.zip aws
88+
89+
# Install Azure CLI
90+
RUN curl -sL https://aka.ms/InstallAzureCLIDeb | bash
2291

23-
# Download the latest installer
24-
ADD https://astral.sh/uv/0.6.10/install.sh /uv-installer.sh
92+
# Install gosu
93+
RUN curl -fsSL "https://github.com/tianon/gosu/releases/download/1.17/gosu-$(dpkg --print-architecture)" -o /usr/local/bin/gosu && \
94+
chmod +x /usr/local/bin/gosu && \
95+
gosu nobody true
2596

26-
# Run the installer then remove it
27-
RUN sh /uv-installer.sh && rm /uv-installer.sh
97+
# Copy Python packages from builder (Ubuntu packages install to /usr/lib)
98+
COPY --from=builder /usr/lib/python3.11 /usr/lib/python3.11
99+
COPY --from=builder /usr/local/lib/python3.11 /usr/local/lib/python3.11
100+
COPY --from=builder /usr/local/bin /usr/local/bin
28101

29-
# Ensure the installed binary is on the `PATH`
30-
ENV PATH="/root/.local/bin/:$PATH"
102+
# Copy only necessary application files (not the entire /app directory)
103+
WORKDIR /app
104+
# Copy only the installed package from site-packages, not all source files
105+
RUN mkdir -p /app
31106

32-
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
33-
RUN unzip awscliv2.zip
34-
RUN ./aws/install
107+
# Copy scripts directory for Azure Batch
108+
COPY scripts /app/scripts
109+
RUN chmod +x /app/scripts/azure_batch/*.sh
35110

36-
ADD . /code/mussel
37-
WORKDIR /code/mussel
111+
COPY entrypoint.sh /usr/local/bin/entrypoint.sh
112+
RUN chmod +x /usr/local/bin/entrypoint.sh
38113

39-
RUN uv sync --frozen --extra $BACKEND
114+
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
115+
CMD ["python", "-c", "print('Mussel container ready. Use mussel-docker <command>')"]

0 commit comments

Comments
 (0)