Skip to content

Add Tescan PFIB TIFF extractor plugin with comprehensive metadata extraction#20

Merged
jat255 merged 3 commits intomainfrom
feature/tescan-pfib-tiff-extractor
Dec 18, 2025
Merged

Add Tescan PFIB TIFF extractor plugin with comprehensive metadata extraction#20
jat255 merged 3 commits intomainfrom
feature/tescan-pfib-tiff-extractor

Conversation

@jat255
Copy link
Copy Markdown
Contributor

@jat255 jat255 commented Dec 18, 2025

Summary

Implements a new extractor plugin for Tescan PFIB (Plasma Focused Ion Beam) and SEM microscopy files (Issue #15), following the plugin architecture established in PR #19 (Zeiss Orion/Fibics HIM extractor).

Key Features

  • Multi-tier metadata extraction strategy:
    • Primary: Embedded INI-style metadata from TIFF tag 50431
    • Fallback: Sidecar .hdr file parsing (INI format with [MAIN] and [SEM] sections)
    • Supplementary: Standard TIFF tags (Make, Software, Artist)
  • Comprehensive metadata coverage: Extracts 70+ fields including:
    • Beam parameters (voltage, current, dwell time, gun type)
    • Stage position (X/Y/Z coordinates, rotation, tilt)
    • Detector settings (name, gain, offset)
    • Chamber conditions (pressure)
    • Scan parameters (mode, speed, rotation)
    • Image dimensions and pixel sizes
  • Robust error handling: Graceful degradation when metadata sources unavailable
  • 100% test coverage: 37 comprehensive tests covering all code paths and edge cases

Implementation Details

Extractor Plugin: nexusLIMS/extractors/plugins/tescan_tif.py

  • Priority: 150 (higher than generic TIFF extractors to ensure proper detection)
  • Auto-discovered via existing plugin system
  • Uses configparser for INI parsing with case preservation
  • Defensive programming with comprehensive exception handling

Architecture Pattern:

  • Follows established ExtractorPlugin protocol from nexusLIMS/extractors/base.py
  • Content-based file detection in supports() method
  • Returns structured metadata with nested dictionaries for complex fields
  • NumPy-style docstrings throughout

Testing

Test Suite: tests/unit/test_extractors/test_tescan_tif.py

  • 37 tests with 100% line coverage (209/209 statements)
  • Test categories:
    • Attribute validation
    • File support detection
    • Metadata extraction (embedded, sidecar, fallback scenarios)
    • Real file validation
    • Edge cases (missing files, corrupted data, encoding issues)
    • Unit conversions

Test Data:

  • Real Tescan AMBER X test files: pfib-tescan.{tif,hdr}
  • Archived test fixtures: tescan-pfib_dataZeroed.tar.gz
  • Synthetic fixtures for isolated unit testing

Documentation

  • ✅ NumPy-style docstrings for all public APIs
  • ✅ Comprehensive module docstring with usage examples
  • ✅ Updated docs/extractors.md with Tescan PFIB section
  • ✅ Changelog entry: docs/changes/15.feature.md
  • ✅ Implementation plan preserved: .claude/plans/issue-15-tescan-tiff-extractor-implementation.md

Verification

# All tests passing
./scripts/run_tests.sh
# Result: 37/37 tests passed, 100% coverage

# Linting clean
./scripts/run_lint.sh
# Result: All checks passed

# Plugin auto-discovery working
uv run python -c "from nexusLIMS.extractors.registry import get_extractor_plugins; print([p.name for p in get_extractor_plugins()])"
# Result: tescan_tif_extractor included

Files Changed

New Files (3):

  • nexusLIMS/extractors/plugins/tescan_tif.py - Main extractor (209 lines)
  • tests/unit/test_extractors/test_tescan_tif.py - Test suite (1056 lines)
  • docs/changes/15.feature.md - Changelog entry

Modified Files (7):

  • docs/extractors.md - Added Tescan PFIB section
  • docs/index.md - Updated extractor count
  • .gitignore - Allow pfib-tescan.tif test file
  • CLAUDE.md - Added archive creation note
  • Test configuration files for new fixtures

Test Data:

  • tests/unit/files/tescan-pfib_dataZeroed.tar.gz - Anonymized test fixtures
  • tests/unit/files/tescan_pfib_example_{hdr,tiff}_meta.txt - Metadata samples

Known Limitations

  • Currently tested only with Tescan AMBER X model
  • Assumes INI format consistency across Tescan instruments (defensive parsing handles variations)
  • Embedded metadata requires section headers (automatically added if missing)

Future Enhancements

  • Support additional Tescan microscope models if format varies
  • Performance optimization for batch processing (if needed)
  • Additional field mappings as new metadata fields are discovered

Closes #15

… testing

- Implement TescanTiffExtractor plugin for TESCAN AMBER X microscope files
- Support embedded HDR metadata in TIFF tag 50431 (primary source)
- Fall back to sidecar .hdr files and standard TIFF tags for metadata
- Extract 70+ metadata fields from [MAIN] and [SEM] sections
- Implement multi-tier extraction strategy with graceful degradation
- Add robust content sniffing for reliable file detection
- Support both standalone .hdr files and .tif files with sidecars
- Proper unit conversions: m→nm, Pa→mPa, V→kV, A→μA, A→pA
- Nested dictionary structure for complex metadata (Stage Position)
- Set priority to 150 to ensure correct detection over generic QuantaTiff

Testing:
- 37 comprehensive unit tests covering all code paths
- 100% line coverage (209/209 statements)
- Edge case testing: missing .hdr, corrupted files, various encodings
- Fallback scenario validation
- Real file extraction testing with pfib-tescan.{tif,hdr}

Documentation:
- NumPy-style docstrings with usage examples
- Changelog entry created (docs/changes/15.feature.md)
- Comprehensive error handling with helpful logging

Code Quality:
- All linting checks passed (ruff)
- Code properly formatted
- No security issues or type safety concerns
- Defensive programming pattern throughout

Closes #15
@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 18, 2025

📚 Documentation Preview

The documentation for this PR has been deployed to:

This preview will be updated on each push to this PR.

@jat255 jat255 merged commit 29da96e into main Dec 18, 2025
10 checks passed
github-actions Bot added a commit that referenced this pull request Dec 18, 2025
@jat255 jat255 deleted the feature/tescan-pfib-tiff-extractor branch December 18, 2025 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Extractor Plugin for Tescan PFIB TIFF Images

1 participant