|
| 1 | +# AGENTS.md |
| 2 | + |
| 3 | +This file provides guidance to coding agents working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +NexusLIMS is an electron microscopy Laboratory Information Management System (LIMS) originally developed at NIST, now maintained by Datasophos. It automatically generates experimental records by extracting metadata from microscopy data files and harvesting information from reservation calendar systems like NEMO. |
| 8 | + |
| 9 | +This is the backend repository. The frontend is at <https://github.com/datasophos/NexusLIMS-CDCS>. |
| 10 | + |
| 11 | +## Development Commands |
| 12 | + |
| 13 | +### Package Management |
| 14 | + |
| 15 | +This project uses `uv` for package management. |
| 16 | + |
| 17 | +```bash |
| 18 | +# Install dependencies |
| 19 | +uv sync |
| 20 | + |
| 21 | +# Add a dependency |
| 22 | +uv add <package-name> |
| 23 | + |
| 24 | +# Add a dev dependency |
| 25 | +uv add --dev <package-name> |
| 26 | +``` |
| 27 | + |
| 28 | +### Testing |
| 29 | + |
| 30 | +Tests should always be run with MPL comparison enabled. |
| 31 | + |
| 32 | +```bash |
| 33 | +# Run all tests with coverage (recommended) |
| 34 | +./scripts/run_tests.sh |
| 35 | + |
| 36 | +# Run a specific test file |
| 37 | +uv run pytest --mpl --mpl-baseline-path=tests/files/figs tests/test_extractors.py |
| 38 | + |
| 39 | +# Run a specific test |
| 40 | +uv run pytest --mpl --mpl-baseline-path=tests/files/figs tests/test_extractors.py::TestClassName::test_method_name |
| 41 | + |
| 42 | +# Generate matplotlib baseline figures for image comparison tests |
| 43 | +./scripts/generate_mpl_baseline.sh |
| 44 | +``` |
| 45 | + |
| 46 | +### Linting and Formatting |
| 47 | + |
| 48 | +```bash |
| 49 | +# Run all linting and formatting checks (recommended) |
| 50 | +./scripts/run_lint.sh |
| 51 | + |
| 52 | +# Or run individually: |
| 53 | +uv run ruff format . --check |
| 54 | +uv run ruff check nexusLIMS tests |
| 55 | + |
| 56 | +# Auto-format code |
| 57 | +uv run ruff format . |
| 58 | + |
| 59 | +# Type checking |
| 60 | +pyright |
| 61 | +``` |
| 62 | + |
| 63 | +### Documentation |
| 64 | + |
| 65 | +Always use `--skip-tui-demos` when building docs locally. TUI demo generation is slow and unnecessary for checking content. |
| 66 | + |
| 67 | +```bash |
| 68 | +# Build documentation (local) |
| 69 | +./scripts/build_docs.sh --skip-tui-demos |
| 70 | + |
| 71 | +# Build with strict mode (used in CI) |
| 72 | +./scripts/build_docs.sh --strict --skip-tui-demos |
| 73 | + |
| 74 | +# Watch mode for auto-rebuild during development |
| 75 | +./scripts/build_docs.sh --watch --skip-tui-demos |
| 76 | +``` |
| 77 | + |
| 78 | +Documentation will be written to `./_build`. |
| 79 | + |
| 80 | +### Running the Record Builder |
| 81 | + |
| 82 | +```bash |
| 83 | +# Run the record builder with full orchestration |
| 84 | +nexuslims build-records |
| 85 | + |
| 86 | +# Or using the module directly: |
| 87 | +uv run python -m nexusLIMS.cli.process_records |
| 88 | + |
| 89 | +# Run in dry-run mode |
| 90 | +nexuslims build-records -n |
| 91 | + |
| 92 | +# Run with verbose output |
| 93 | +nexuslims build-records -vv |
| 94 | + |
| 95 | +# Run the core record builder directly |
| 96 | +uv run python -m nexusLIMS.builder.record_builder |
| 97 | +``` |
| 98 | + |
| 99 | +## Architecture Overview |
| 100 | + |
| 101 | +### Core Components |
| 102 | + |
| 103 | +1. **Database Layer** (`nexusLIMS/db/`) |
| 104 | + - SQLite database tracks instruments and session logs through Alembic migrations |
| 105 | + - Main tables: `instruments` and `session_log` |
| 106 | + - `models.py` defines SQLModel ORM classes `Instrument` and `SessionLog` |
| 107 | + - `enums.py` defines enums `EventType` and `RecordStatus` |
| 108 | + - `session_handler.py` provides higher-level session utilities |
| 109 | + |
| 110 | +2. **Harvesters** (`nexusLIMS/harvesters/`) |
| 111 | + - Extract reservation and usage data from external systems |
| 112 | + - Primary harvester is NEMO in `nemo/` |
| 113 | + - SharePoint calendar support is deprecated |
| 114 | + |
| 115 | +3. **Extractors** (`nexusLIMS/extractors/`) |
| 116 | + - Plugin-based metadata extraction |
| 117 | + - Plugins live in `extractors/plugins/` |
| 118 | + - Instrument profiles live in `extractors/plugins/profiles/` |
| 119 | + - Preview generators live in `extractors/plugins/preview_generators/` |
| 120 | + - Extractors return a dict with an `nx_meta` key for NexusLIMS-specific metadata |
| 121 | + |
| 122 | +4. **Record Builder** (`nexusLIMS/builder/record_builder.py`) |
| 123 | + - Main orchestration entry point is `process_new_records()` |
| 124 | + - `build_record()` creates XML records conforming to the Nexus Experiment schema |
| 125 | + |
| 126 | +5. **Schemas** (`nexusLIMS/schemas/`) |
| 127 | + - `activity.py` contains `AcquisitionActivity` and file clustering logic |
| 128 | + - XML schema validation is performed against `nexus-experiment.xsd` |
| 129 | + |
| 130 | +6. **CDCS Integration** (`cdcs.py`) |
| 131 | + - Uploads records to the NexusLIMS CDCS frontend |
| 132 | + - Uses credentials and configuration from environment-driven app config |
| 133 | + |
| 134 | +### Key Workflows |
| 135 | + |
| 136 | +**Record Building Process** |
| 137 | +1. NEMO harvester polls for new or ended reservations |
| 138 | +2. Harvester creates `session_log` entries |
| 139 | +3. Record builder finds sessions that are ready to build |
| 140 | +4. Files are found using GNU `find` |
| 141 | +5. Files are clustered into Acquisition Activities |
| 142 | +6. Metadata is extracted |
| 143 | +7. XML is built and validated |
| 144 | +8. Record is uploaded to CDCS |
| 145 | + |
| 146 | +**File Finding Strategy** |
| 147 | +- Controlled by `NX_FILE_STRATEGY` |
| 148 | +- `exclusive`: only files with known extractors |
| 149 | +- `inclusive`: all files, with basic metadata for unknowns |
| 150 | + |
| 151 | +## Configuration |
| 152 | + |
| 153 | +Environment variables are loaded from `.env` file data. See `.env.example`. |
| 154 | + |
| 155 | +Critical paths: |
| 156 | +- `NX_INSTRUMENT_DATA_PATH`: read-only mount of centralized instrument data |
| 157 | +- `NX_DATA_PATH`: writable parallel directory for metadata and previews |
| 158 | +- `NX_DB_PATH`: SQLite database path |
| 159 | +- `NX_LOG_PATH`: optional directory for logs, defaults under `NX_DATA_PATH` |
| 160 | +- `NX_RECORDS_PATH`: optional directory for XML records, defaults under `NX_DATA_PATH` |
| 161 | +- `NX_LOCAL_PROFILES_PATH`: optional directory for site-specific instrument profiles |
| 162 | + |
| 163 | +NEMO integration: |
| 164 | +- Supports multiple NEMO instances via `NX_NEMO_ADDRESS_N` and `NX_NEMO_TOKEN_N` |
| 165 | +- Optional timezone and datetime format overrides may be set per instance |
| 166 | + |
| 167 | +CDCS authentication: |
| 168 | +- `NX_CDCS_TOKEN` |
| 169 | +- `NX_CDCS_URL` |
| 170 | + |
| 171 | +## Important Implementation Details |
| 172 | + |
| 173 | +### Database Session States |
| 174 | + |
| 175 | +Sessions progress through `session_log.record_status`: |
| 176 | +- `WAITING_FOR_END` |
| 177 | +- `TO_BE_BUILT` |
| 178 | +- `COMPLETED` |
| 179 | +- `ERROR` |
| 180 | +- `NO_FILES_FOUND` |
| 181 | +- `NO_CONSENT` |
| 182 | +- `NO_RESERVATION` |
| 183 | + |
| 184 | +### File Delay Mechanism |
| 185 | + |
| 186 | +`NX_FILE_DELAY_DAYS` controls the retry window for `NO_FILES_FOUND` sessions. |
| 187 | + |
| 188 | +### Instrument Database Requirements |
| 189 | + |
| 190 | +Each instrument in `instruments` must specify: |
| 191 | +- `harvester`: `nemo` or `sharepoint` |
| 192 | +- `filestore_path`: relative to `NX_INSTRUMENT_DATA_PATH` |
| 193 | +- `timezone` |
| 194 | +- For NEMO-backed instruments, `api_url` matching NEMO tool names |
| 195 | + |
| 196 | +### Testing Infrastructure |
| 197 | + |
| 198 | +- Uses `pytest` with `pytest-mpl` for image comparison tests |
| 199 | +- Test fixtures set up mock databases and environments |
| 200 | +- Many test files are `.tar.gz` archives extracted during test setup |
| 201 | +- Coverage reports are generated in `tests/coverage/` |
| 202 | + |
| 203 | +### Code Style |
| 204 | + |
| 205 | +- Ruff is used for formatting and linting |
| 206 | +- Pyright is configured for type checking |
| 207 | +- NumPy-style docstrings are preferred |
| 208 | + |
| 209 | +### Changelog Management |
| 210 | + |
| 211 | +- Changelog content is managed by `towncrier` |
| 212 | +- When adding a feature or making a significant change, create a changelog blurb in `docs/changes` |
| 213 | +- Follow the instructions in `docs/changes/README.rst` |
| 214 | +- When preparing or cutting a release in Codex, use the `nexuslims-release` skill |
| 215 | + |
| 216 | +### Configuration Management Rule |
| 217 | + |
| 218 | +Never use `os.getenv()` or `os.environ` directly for application configuration access outside `nexusLIMS/config.py`. |
| 219 | + |
| 220 | +```python |
| 221 | +# Wrong |
| 222 | +import os |
| 223 | +path = os.getenv("NX_DATA_PATH") |
| 224 | + |
| 225 | +# Correct |
| 226 | +from nexusLIMS import config |
| 227 | +path = config.NX_DATA_PATH |
| 228 | +``` |
| 229 | + |
| 230 | +Why this rule exists: |
| 231 | +- centralizes configuration management |
| 232 | +- provides validation and defaults |
| 233 | +- makes testing easier |
| 234 | +- keeps configuration access consistent |
| 235 | + |
| 236 | +The only exception is `nexusLIMS/config.py`, which is responsible for reading environment variables and exposing validated module-level attributes. |
| 237 | + |
| 238 | +## Technical Notes |
| 239 | + |
| 240 | +- See `docs/reference/textual_testing_reference.md` for Textual testing patterns used in this repo |
| 241 | +- See `.claude/notes/zeroing-compressed-tiff-files.md` for the TIFF zeroing workflow referenced by past work in this repo |
| 242 | +- When creating archive files on macOS, use `COPYFILE_DISABLE=1` so macOS metadata files are not included |
| 243 | + |
| 244 | +## Python Version Support |
| 245 | + |
| 246 | +Supports Python 3.11 and 3.12 only, as defined in `pyproject.toml`. |
| 247 | + |
| 248 | +## Development Notes |
| 249 | + |
| 250 | +- This is a fork maintained by Datasophos, not affiliated with NIST |
| 251 | +- Original NIST documentation may be outdated: <https://pages.nist.gov/NexusLIMS> |
| 252 | +- When adding new file format support, create an extractor plugin in `nexusLIMS/extractors/plugins/` |
| 253 | +- When customizing instrument behavior, create an `InstrumentProfile` in `extractors/plugins/profiles/` or in the directory pointed to by `NX_LOCAL_PROFILES_PATH` |
| 254 | +- HyperSpy is used extensively for reading and processing microscopy data |
| 255 | +- The project structure mirrors the data structure: `NX_DATA_PATH` parallels `NX_INSTRUMENT_DATA_PATH` |
| 256 | + |
| 257 | +### Developing Extractor Plugins |
| 258 | + |
| 259 | +See `docs/writing_extractor_plugins.md` for detailed guidance. |
| 260 | + |
| 261 | +Quick reference: |
| 262 | +1. Create a class in `nexusLIMS/extractors/plugins/` with: |
| 263 | + - `name` |
| 264 | + - `priority` |
| 265 | + - `supported_extensions` |
| 266 | + - `supports(context: ExtractionContext) -> bool` |
| 267 | + - `extract(context: ExtractionContext) -> dict[str, Any]` |
| 268 | +2. Return a dict with an `nx_meta` key containing: |
| 269 | + - `DatasetType` |
| 270 | + - `Data Type` |
| 271 | + - `Creation Time` |
| 272 | +3. The registry auto-discovers plugins on first use |
| 273 | + |
| 274 | +Key patterns: |
| 275 | +- use priority-based selection |
| 276 | +- use `supports()` for content sniffing beyond extension checks |
| 277 | +- check `context.instrument` for instrument-specific behavior |
| 278 | +- handle missing or corrupted files gracefully |
| 279 | +- add tests under `tests/unit/test_extractors/` |
0 commit comments