Multi-signal file support and NEOARM TEM enhancements#21
Merged
Conversation
- Create tar.gz archives for quanta-fei_2_dataZeroed.tif (6.2K), neoarm-gatan_SI_dataZeroed.dm4 (71K), and neoarm-gatan_image_dataZeroed.dm4 (12K) - Add QUANTA_FEI_2, NEOARM_GATAN_SI, and NEOARM_GATAN_IMAGE entries to tests/unit/utils.py tars dictionary - Create pytest fixtures (quanta_fei_2_file, neoarm_gatan_si_file, neoarm_gatan_image_file) in tests/unit/conftest.py for automatic extraction and cleanup - Test files use zeroed data for size optimization (metadata preserved) and metadata has been sanitized to remove potentially identifying information
- Add FieldDefinition NamedTuple to base.py for standardized field configuration - Supports unit conversion, string/numeric handling, and zero-value suppression - Refactor Quanta, Tescan, and Orion HIM extractors to use FieldDefinition - Reduces code duplication and improves maintainability across TIFF extractors - Update extractor tests and fixtures to reflect new extraction methodology
Replace 'not found' sentinel value with None for better type safety and Pythonic code. Updated all 58 call sites across extractors and utilities to use 'is None' and 'is not None' checks. Fixed exception handling in numeric field conversion to properly handle None values.
…on to DM3/DM4 extractor Enhanced the Digital Micrograph extractor to capture three additional metadata fields from JEOL NEOARM TEM images and other DM3/DM4 files: - Signal Name: Detector signal type (e.g., ADF, BF) from DataBar metadata - Aperture settings: Condenser, Objective, and Selected Area aperture values from Microscope Info - Sample Time: Pixel dwell time in microseconds from DigiScan metadata Added comprehensive test coverage using neoarm_gatan_image_file fixture to verify all three fields are extracted correctly. Fixes #14
… metadata dicts BREAKING CHANGE: All extractors must now return list[dict] instead of dict This change establishes the foundation for multi-signal file support by standardizing the extractor return type: - extract() methods now return list[dict[str, Any]] instead of dict[str, Any] - Single-signal files return a 1-element list for consistency - Multi-signal files return one dict per signal/dataset - Updated docstrings and examples to reflect new contract This allows the Activity layer to automatically expand multi-signal files (e.g., DM3/DM4 with multiple signals) into separate datasets in the experimental record. Related to multi-signal file handling initiative.
…eturn lists Update all extractor plugins for single-signal file formats to conform to the new list-based return contract: - BasicFileInfoExtractor: Returns [metadata_dict] - QuantaTiffExtractor: Returns [metadata_dict] - EdaxSpcExtractor & EdaxMsaExtractor: Returns [metadata_dict] - SerEmiExtractor: Returns [metadata_dict] - OrionTiffExtractor: Returns [metadata_dict] - TescanTiffExtractor: Returns [metadata_dict] For these formats, the list always contains exactly one element since each file represents a single dataset. This provides a consistent interface across all extractors and prepares for multi-signal handling in DM3/DM4 files. Updated docstrings to reflect the new return type.
Enable extraction of all signals from multi-signal DM3/DM4 files: - get_dm3_metadata() now returns list of metadata dicts (one per signal) - Previously returned only first signal, now returns all signals - Single-signal files return 1-element list for consistency - Multi-signal files (e.g., spectrum images with multiple channels) return one metadata dict per signal This allows proper representation of complex DM3/DM4 files that contain multiple datasets, such as: - Files with multiple image/spectrum signals - Spectrum images with separate energy loss and thickness maps - Combined STEM/EELS acquisition sessions Each signal gets its own metadata extraction and can be displayed as a separate dataset in the experimental record.
…d preview generation Update the extraction orchestration layer to handle multi-signal files: parse_metadata() changes: - Now processes list of metadata dicts returned by extractors - For multi-signal files, writes separate JSON files with _signalN suffix - Generates one preview per signal with _signalN.thumb.png naming - Returns list of metadata dicts and list of preview paths - Single-signal files maintain backward-compatible naming (no suffix) create_preview() changes: - Added signal_index parameter for multi-signal file preview generation - Generates preview filename with _signalN suffix when index provided - Passes signal_index through ExtractionContext to preview generators HyperSpyPreviewGenerator changes: - Handles multi-signal files by selecting appropriate signal via index - Uses context.signal_index to choose which signal to preview - Falls back to first signal for backward compatibility This enables complete multi-signal workflow: 1. DM3/DM4 file with 4 signals → 4 JSON files + 4 preview PNGs 2. Each signal gets unique metadata and preview 3. Single-signal files unaffected (no _signal0 suffix for compatibility)
…tivity records Update AcquisitionActivity to handle multi-signal files by creating one dataset entry per signal: add_file_by_path() changes: - Processes list of metadata dicts returned by parse_metadata() - For multi-signal files, adds one entry per signal to parallel lists - Repeats filename for each signal but uses different preview paths - Each signal gets its own metadata, preview, and warnings _add_dataset_element() changes: - Added preview_path parameter for explicit preview file specification - Added signal_index and total_signals for multi-signal naming - Dataset names include signal index: "filename.ext (X of Y)" - Uses provided preview_path instead of computing from filename as_xml() changes: - Tracks file occurrence counts to identify multi-signal files - Passes signal index and preview path to _add_dataset_element() - Each signal becomes a separate <dataset> element in XML Example: A DM3 file with 4 signals creates: - 4 <dataset> elements with names "file.dm3 (1 of 4)", "file.dm3 (2 of 4)", etc. - 4 unique preview paths: file_signal0.thumb.png, file_signal1.thumb.png, etc. - 4 metadata entries with signal-specific information - All share the same source file location
…return Update unit tests to handle the new list-based return format from extractors: All extractor plugin tests updated: - test_basic_metadata.py: Assert metadata is list, access first element - test_digital_micrograph.py: Handle multi-signal DM3/DM4 test cases - test_edax.py: Update SPC/MSA extractor tests for list returns - test_fei_emi.py: Update SER/EMI extractor tests - test_orion_HIM.py: Update Orion TIFF extractor tests - test_quanta_tif.py: Update Quanta TIFF extractor tests - test_tescan_tif.py: Update Tescan TIFF extractor tests test_extractor_module.py extensive updates: - parse_metadata() now returns (list[dict], list[Path]) - Added multi-signal specific tests for signal_index handling - test_parse_metadata_multi_signal_no_preview(): Verify [None] list returned - test_create_preview_multi_signal_list_with_index(): Test signal selection - test_create_preview_multi_signal_list_without_index(): Test legacy mode - Updated cleanup helpers to handle list of preview paths test_plugins.py updates: - Mock extractors return lists for compatibility - Registry tests verify list-based returns test_thumbnail_generator.py updates: - Added multi-signal preview generation tests All tests verify: 1. Extractors return lists (even for single-signal files) 2. First element accessed for single-signal validation 3. Multi-signal files properly handled with multiple elements 4. Backward compatibility maintained
Update record builder tests to verify multi-signal file handling: test_activity.py changes: - Added test_activity_multi_signal_file() to verify expansion behavior - Verifies that multi-signal files create multiple dataset entries - Confirms dataset names include signal indices - Validates parallel lists contain repeated filenames with unique metadata test_record_builder.py changes: - Updated XML validation tests for multi-signal datasets - Mock parse_metadata returns list format - Verified dataset naming with signal indices in XML output - Confirmed preview path handling for multi-signal files These tests ensure: 1. Multi-signal files expand into N dataset elements 2. Each dataset has unique name with (X of Y) format 3. All datasets share same file location 4. Each dataset has its own preview path and metadata 5. XML structure remains valid with multi-signal datasets
Add comprehensive integration tests for multi-signal file handling: test_end_to_end_workflow.py additions: - test_multi_signal_record_generation_and_structure(): Verifies complete multi-signal workflow from file discovery through XML generation - Tests DM4 file with 4 signals and DM3 file with 2 signals - Validates dataset naming: "filename.ext (X of Y)" format - Confirms each signal gets unique preview but shares file location - Verifies XML schema compliance with multi-signal datasets test_nemo_integration.py updates: - Updated for list-based metadata returns tests/integration/conftest.py additions: - Added multi_signal_integration_record fixture - Helper functions for URL validation and metadata verification - Seed data for multi-signal test files tests/integration/docker/nemo/fixtures/seed_data.json: - Added test data for multi-signal integration tests tests/conftest.py updates: - Fixture updates for multi-signal test support Test fixtures added: - multi_signal_test_files.py: Fixture definitions for multi-signal test files - test_hyperspy_preview_generator_multi_signal_no_index.png: Baseline image for multi-signal preview generation tests Other test updates: - test_nemo_api.py: Updated for list-based returns - test_utils.py: Minor updates for compatibility These integration tests verify the complete pipeline: 1. Multi-signal files discovered in session 2. Metadata extracted for each signal 3. Previews generated for each signal 4. XML record created with separate datasets 5. Record uploaded to CDCS successfully
- Ensure that PR runs use a PR-tagged docker image - Collect Docker logs from all services (nemo, cdcs, mongo, postgres, redis, mailpit, caddy) after tests complete - Detect which compose file was used (ci.yml or docker-compose.yml) to ensure correct log collection - Display collected logs in action output when tests fail for immediate visibility - Upload full logs as artifacts for all runs (success and failure) - Provides comprehensive troubleshooting information without requiring artifact download
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Contributor
📚 Documentation PreviewThe documentation for this PR has been deployed to:
This preview will be updated on each push to this PR. |
635278e to
2fbad05
Compare
2fbad05 to
2d78fe2
Compare
a7abaf1 to
216ba9e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements multi-signal file support and enhancement to the Digital Micrograph extractor for improved metadata extraction from DM3/DM4 files, particularly JEOL NEOARM TEM images.
Key Changes
Multi-signal file support: Files containing multiple signals (like DM3/DM4) are now automatically expanded into separate datasets in experimental records. Each signal gets its own metadata extraction, preview image, and XML dataset element.
NEOARM TEM enhancements: Enhanced Digital Micrograph extractor now captures JEOL-specific metadata including signal names (ADF, BF), aperture settings (condenser, objective, selected area), and pixel dwell time measurements.
Preview naming: Multi-signal files generate indexed previews (e.g.,
file.dm4_signal0.thumb.png). Single-signal files use traditional naming for backward compatibility.Type safety improvement: Refactored
try_getting_dict_value()to returnNoneinstead of magic string"not found"for better type safety and Python idiomaticity.TIFF extractors refactoring: Standardized TIFF-based extractors to use consistent
FieldDefinitionconfiguration, reducing code duplication.Documentation updates: Comprehensive updates to extractor documentation reflecting plugin architecture and multi-signal support.
Architecture Changes
The implementation maintains backward compatibility through:
Transparent expansion: Multi-signal files are expanded at the Activity layer, maintaining the 1:1 mapping between metadata/preview/warnings lists internally while supporting multiple datasets per file.
Fallback behavior: Single-signal files work exactly as before, using traditional preview naming (no signal suffix).
Signal detection: Detection of multi-signal structure happens at the extractor level via
"nx_meta_list"key vs traditional"nx_meta"key.Testing
All changes are validated with comprehensive test coverage:
neoarm_gatan_si_filefixtureFrontend Requirements
Note: Proper display and download of multi-signal records in the CDCS frontend requires updated XSLT stylesheet from NexusLIMS-CDCS commit 240a7f9.
Files Modified
Core extractors:
nexusLIMS/extractors/plugins/digital_micrograph.py- Multi-signal extraction and NEOARM supportnexusLIMS/extractors/__init__.py- Signal-indexed preview generationnexusLIMS/extractors/plugins/preview_generators/- Enhanced preview handlingnexusLIMS/extractors/plugins/{quanta_tif,tescan_tif,orion_HIM_tif}.py- Standardized field definitionsActivity/record building:
nexusLIMS/schemas/activity.py- Multi-signal file handling in AcquisitionActivitynexusLIMS/builder/record_builder.py- End-to-end integrationUtilities:
nexusLIMS/utils.py- Type-safetry_getting_dict_value()Documentation:
docs/extractors.md- Architecture and plugin system documentationdocs/writing_extractor_plugins.md- Plugin development guideCLAUDE.md- Project guidelinesChangelog Entries
14.feature.md- NEOARM TEM metadata extraction enhancement14.feature.2.md- Multi-signal file support21.misc.md- TIFF extractor refactoring+1.misc.md- Type safety improvementBackward Compatibility
✅ All existing single-signal file tests pass
✅ Existing preview naming preserved for single signals
✅ New multi-signal structure transparent to downstream code
✅ Fallback extractors work identically
Performance
No performance regressions expected:
Related Issue
Fixes #14 - Multi-signal file support for improved microscopy data organization