Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
646 changes: 646 additions & 0 deletions .claude/plans/issue-15-tescan-tiff-extractor-implementation.md

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ tests/unit/files/*.tif
!quanta_just_modded_mdata.tif
!quanta_bad_metadata.tif
!test_STEM_image.dm3
!pfib-tescan.tif
tests/unit/files/*.hdf5
tests/unit/files/**/*.json
tests/integration/**/*.json
Expand Down
2 changes: 2 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,8 @@ path = config.NX_DATA_PATH
Additional technical documentation for specific tasks:

- **[Zeroing Compressed TIFF Files](.claude/notes/zeroing-compressed-tiff-files.md)**: Binary patching method for zeroing out LZW-compressed TIFF image data while preserving all metadata and file structure. Use when you need to create test fixtures or anonymized data files.
- **Creating archive files**: When creating an archive file with test files (or for any other purpose), ensure that MacOS hidden files (like `.DS_Store`), MacOS resource forks, or others do not end up in the archive. Always use COPYFILE_DISABLE=1 when creating archives on MacOS.


## Python Version Support

Expand Down
1 change: 1 addition & 0 deletions docs/changes/15.feature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Added support for Tescan PFIB (Plasma FIB) TIFF files. The new extractor automatically detects and parses metadata from Tescan microscopy TIFF files, including imaging parameters, stage position, detector settings, and FIB-specific information.
149 changes: 138 additions & 11 deletions docs/extractors.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,18 @@ each format.

## Quick Reference

| **Extension** | **Support** | **Instrument/Software** | **Data Types** | **Key Features** |
|---------------|-------------|-------------------------|----------------|------------------|
| .dm3, .dm4 | ✅ Full | Gatan DigitalMicrograph | TEM/STEM Imaging, EELS, EDS, Diffraction, Spectrum Imaging | Comprehensive metadata, instrument-specific parsers, automatic type detection |
| .tif | ✅ Full | FEI/Thermo Fisher SEM/FIB | SEM Imaging | Beam settings, stage position, vacuum conditions, detector config |
| .tif | ✅ Full | Zeiss Orion HIM / Fibics HIM | HIM Imaging | Helium ion beam settings, stage position, detector configuration, image metadata |
| .ser, .emi | ✅ Full | FEI TIA Software | TEM/STEM Imaging, Diffraction, EELS/EDS Spectra & SI | Multi-file support, experimental conditions, acquisition parameters |
| .spc | ✅ Full | EDAX (Genesis, TEAM) | EDS Spectrum | Detector angles, energy calibration, element identification |
| .msa | ✅ Full | EDAX & others (standard) | EDS Spectrum | EMSA/MAS standard format, vendor extensions supported |
| .png, .jpg, .tiff, .bmp, .gif | ⚠️ Preview | Various (exported images) | Unknown | Basic metadata, square thumbnail generation |
| .txt | ⚠️ Preview | Various (logs, notes) | Unknown | Basic metadata, text-to-image preview |
| *others* | ❌ Minimal | N/A | Unknown | Timestamp only, placeholder preview |
| **Instrument/Software** | **Extension** | **Support** | **Data Types** | **Key Features** |
|-------------------------|---------------|-------------|----------------|------------------|
| [Gatan DigitalMicrograph](#digital-micrograph-files-dm3-dm4) | .dm3, .dm4 | ✅ Full | TEM/STEM Imaging, EELS, EDS, Diffraction, Spectrum Imaging | Comprehensive metadata, instrument-specific parsers, automatic type detection |
| [FEI/Thermo Fisher SEM/FIB](#feithermo-fisher-tif-files-tif) | .tif | ✅ Full | SEM Imaging | Beam settings, stage position, vacuum conditions, detector config |
| [Zeiss Orion HIM / Fibics HIM](#zeiss-orion-fibics-him-tif-files-tif) | .tif | ✅ Full | SEM/HIM Imaging | Helium ion beam settings, stage position, detector configuration, image metadata |
| [Tescan (P)FIB/SEM](#tescan-pfibsem-tif-files-tif) | .tif | ✅ Full | SEM Imaging | High-voltage settings, stage position, detector gain/offset, scan parameters, stigmator values |
| [FEI TIA Software](#fei-tia-files-ser-emi) | .ser, .emi | ✅ Full | TEM/STEM Imaging, Diffraction, EELS/EDS Spectra & SI | Multi-file support, experimental conditions, acquisition parameters |
| [EDAX (Genesis, TEAM)](#edax-eds-files-spc-msa) | .spc | ✅ Full | EDS Spectrum | Detector angles, energy calibration, element identification |
| [EDAX & others (standard)](#edax-eds-files-spc-msa) | .msa | ✅ Full | EDS Spectrum | EMSA/MAS standard format, vendor extensions supported |
| [Various (exported images)](#image-formats) | .png, .jpg, .tiff, .bmp, .gif | ⚠️ Preview | Unknown | Basic metadata, square thumbnail generation |
| [Various (logs, notes)](#text-files-txt) | .txt | ⚠️ Preview | Unknown | Basic metadata, text-to-image preview |
| [Unknown Files](#unknown-files) | *others* | ❌ Minimal | Unknown | Timestamp only, placeholder preview |

**Legend**: ✅ Full = Comprehensive metadata extraction<br/>⚠️ Preview = Basic metadata + custom preview<br/>❌ Minimal = Timestamp only

Expand All @@ -35,6 +36,7 @@ Extraction is performed automatically during record building. Each file is ident

These formats have dedicated extractors that parse comprehensive metadata specific to their structure.

(digital-micrograph-files-dm3-dm4)=
### Digital Micrograph Files (.dm3, .dm4)

**Support Level**: ✅ Full
Expand Down Expand Up @@ -90,6 +92,7 @@ The extractor includes specialized parsers for specific instruments:
- For stacked images, metadata is extracted from the first plane
- Session info (Operator, Specimen, Detector) may be unreliable and is flagged in warnings

(feithermo-fisher-tif-files-tif)=
### FEI/Thermo Fisher TIF Files (.tif)

**Support Level**: ✅ Full
Expand Down Expand Up @@ -133,6 +136,7 @@ The extractor includes specialized parsers for specific instruments:
- Some instruments write duplicate metadata sections which are handled automatically
- Works with both older config-style metadata and newer XML-based metadata

(zeiss-orion-fibics-him-tif-files-tif)=
### Zeiss Orion / Fibics HIM TIF Files (.tif)

**Support Level**: ✅ Full
Expand Down Expand Up @@ -192,6 +196,91 @@ This content-based detection allows proper identification even when files use `.
- If XML metadata is missing or corrupted, the extractor gracefully falls back to basic file information
- Both Zeiss Orion and Fibics HIM variants store metadata as embedded XML, making extraction reliable across different software versions

(tescan-pfibsem-tif-files-tif)=
### Tescan PFIB/SEM TIF Files (.tif)

**Support Level**: ✅ Full

**Description**: TIFF images saved by Tescan PFIB (Focused Ion Beam) and SEM instruments (e.g., AMBER X) with embedded INI-style metadata in custom TIFF tags or sidecar .hdr files.

**Extractor Module**: {py:mod}`nexusLIMS.extractors.plugins.tescan_tif`

**File Format Details**:

The extractor uses a three-tier strategy for metadata extraction:

1. **Primary**: Extracts metadata from embedded TIFF tag 50431 (custom Tescan metadata tag) containing INI-style metadata
2. **Fallback**: If embedded metadata fails or is incomplete, looks for a sidecar `.hdr` file with full metadata in INI format (`[MAIN]` and `[SEM]` sections)
3. **Supplementary**: Always extracts basic TIFF tags (tag 271 for Make, tag 305 for Software, tag 315 for Artist) to supplement or override other metadata

This multi-tier approach ensures complete metadata is available whether metadata is embedded in the TIFF or stored in a sidecar `.hdr` file.

**Key Metadata Extracted**:

**From [MAIN] section**:
- Instrument identification (Device, Model, Serial Number)
- User information (Operator name)
- Acquisition timestamp (Date and Time)
- Magnification
- Software version

**From [SEM] section**:
- Beam parameters (High Voltage, Spot Size, Emission Current)
- Stage position (X, Y, Z coordinates and Rotation/Tilt angles)
- Scan settings (Dwell time, Scan mode, Rotation)
- Detector configuration (Name, Gain, Offset)
- Vacuum conditions (Chamber pressure)
- Stigmator values (X and Y corrections)
- Gun type configuration
- Working Distance
- Session ID for traceability

**Unit Conversions**:
- Magnification: Converted from raw values to kiloX (kX)
- Voltages: Converted from millivolts to kilovolts (kV)
- Distances: Converted from meters to millimeters (mm) or nanometers (nm) as appropriate
- Currents: Converted from amperes to microamperes (μA)
- Pressure: Converted to millipascals (mPa)
- Pixel sizes: Calculated from image dimensions and field of view, converted to nanometers (nm)

**Data Types Detected**:

- SEM Imaging

**Special Features**:

- Priority 150 - Checked before generic TIFF extractors to properly identify Tescan files
- Content-based detection via custom TIFF tags even if `.hdr` file is missing
- Comprehensive stage position tracking (X, Y, Z, Rotation, Tilt)
- Detector settings extraction (Gain and Offset values)
- Automatic conversion of physics units to display-friendly formats
- Empty field exclusion - Fields with empty values are not included in output
- Session tracking with unique Session ID

**Preview Generation**:

- Converts image to square thumbnail (500×500 px default)
- Maintains aspect ratio with padding

**Warnings**:

The extractor flags the following fields as potentially unreliable:
- **Operator**: May reflect a logged-in user rather than the actual operator who collected the data

**Compatibility Notes**:

- **Tescan AMBER X**: Fully tested and verified
- **Other Tescan SEM/PFIB Instruments**: Likely compatible due to consistent INI metadata format, but not yet tested
- Both `.tif` and `.tiff` extensions are supported

**Notes**:

- If `.hdr` file is present but cannot be read, the extractor falls back to embedded TIFF tag metadata
- If both sidecar and embedded metadata are available, the sidecar is preferred (more reliable)
- The extractor gracefully handles missing or incomplete metadata sections
- Pixel size is calculated from magnification and field width when not directly available

(fei-tia-files-ser-emi)=
### FEI TIA Files (.ser, .emi)

**Support Level**: ✅ Full
Expand Down Expand Up @@ -235,6 +324,7 @@ This content-based detection allows proper identification even when files use `.
- Multiple signals in one `.emi` file are handled; metadata is extracted from the appropriate index
- Later signals in a multi-file series may have less metadata than the first

(edax-eds-files-spc-msa)=
### EDAX EDS Files (.spc, .msa)

**Support Level**: ✅ Full
Expand Down Expand Up @@ -289,6 +379,7 @@ This content-based detection allows proper identification even when files use `.

These formats receive basic metadata extraction and custom preview generation, but do not have dedicated metadata parsers.

(image-formats)=
### Image Formats

**Support Level**: ⚠️ Preview Only
Expand All @@ -314,6 +405,7 @@ These formats receive basic metadata extraction and custom preview generation, b
- These are typically auxiliary files (screenshots, exported images, etc.)
- Marked as `DatasetType: Unknown` in records

(text-files-txt)=
### Text Files (.txt)

**Support Level**: ⚠️ Preview Only
Expand All @@ -337,6 +429,7 @@ These formats receive basic metadata extraction and custom preview generation, b
- Common for log files, notes, and exported data
- Marked as `DatasetType: Unknown` in records

(unknown-files)=
## Unsupported Formats

**Support Level**: ❌ Minimal
Expand Down Expand Up @@ -428,12 +521,46 @@ See {doc}`writing_extractor_plugins` for instructions on how to write a new extr

## API Reference

### Extractor Registry Properties

The {py:class}`nexusLIMS.extractors.registry.ExtractorRegistry` class provides convenient properties for querying registered extractors:

**`extractors` Property**
: Returns a dictionary mapping file extensions to lists of extractor classes, sorted by priority (descending). This property automatically triggers plugin discovery if not already performed.

```python
from nexusLIMS.extractors.registry import get_registry

registry = get_registry()
extractors_by_ext = registry.extractors
# Returns: {
# 'dm3': [<class digital_micrograph.DM3Extractor'>],
# 'dm4': [<class 'digital_micrograph.DM3Extractor'>],
# 'msa': [<class 'edax.MsaExtractor'>],
# 'spc': [<class 'edax.SpcExtractor'>],
# ...
# }
```

**`extractor_names` Property**
: Returns a deduplicated, alphabetically-sorted list of all registered extractor class names. Includes both extension-specific and wildcard extractors. This property also triggers auto-discovery if needed.

```python
registry = get_registry()
names = registry.extractor_names
# Returns: ["BasicFileInfoExtractor", "DM3Extractor", ..., "TescanTiffExtractor"]
```

### Extractor Modules

For complete API documentation of the extractor modules, see:

- {py:mod}`nexusLIMS.extractors` - Main extractor module
- {py:mod}`nexusLIMS.extractors.registry` - Extractor registry and auto-discovery
- {py:mod}`nexusLIMS.extractors.plugins.digital_micrograph` - DM3/DM4 file extractor
- {py:mod}`nexusLIMS.extractors.plugins.quanta_tif` - FEI/Thermo TIF file extractor
- {py:mod}`nexusLIMS.extractors.plugins.orion_HIM_tif` - Zeiss Orion / Fibics HIM TIF file extractor
- {py:mod}`nexusLIMS.extractors.plugins.tescan_tif` - Tescan PFIB/SEM TIF file extractor
- {py:mod}`nexusLIMS.extractors.plugins.fei_emi` - FEI TIA .ser/.emi file extractor
- {py:mod}`nexusLIMS.extractors.plugins.edax` - EDAX .spc/.msa file extractor
- {py:mod}`nexusLIMS.extractors.plugins.basic_metadata` - Basic metadata fallback extractor
Expand Down
9 changes: 8 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,14 @@ Upgrading from v1.x? Step-by-step instructions for migrating to v2.0+.
Learn about the record building workflow and data taxonomy.
```

```{grid-item-card} 🛠️ Developer Guide
```{grid-item-card} 🛠️ Supported File Formats
:link: extractors
:link-type: doc

Explore the comprehensive list of supported microscopy file formats and NexusLIMS's metadata extraction capabilities.
```

```{grid-item-card} ⌨️ Developer Guide
:link: dev_guide
:link-type: doc

Expand Down
4 changes: 2 additions & 2 deletions nexusLIMS/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,10 +140,10 @@ class Settings(BaseSettings):
),
)
NX_IGNORE_PATTERNS: list[str] = Field(
["*.mib", "*.db", "*.emi"],
["*.mib", "*.db", "*.emi", "*.hdr"],
description=(
"List of glob patterns to ignore when searching for experiment files. "
"Default is `['*.mib','*.db','*.emi']`."
"Default is `['*.mib','*.db','*.emi','*.hdr']`."
),
)
NX_INSTRUMENT_DATA_PATH: DirectoryPath = Field(
Expand Down
Loading
Loading