Description:
Current Behavior
The evaluation step in run_clean_pipeline.py loads all CWT files from data/processed/ regardless of the runs parameter specified in the config file:
# Line 440 in run_clean_pipeline.py
cwt_files = list(processed_dir.glob("*.npy")) # Loads everything
The manifest is only used to determine segment_type (signal vs noise), but doesn't filter by observing run. This means:
- Config setting
signals.runs: [O4a, O4b] only affects downloading
- Evaluation uses whatever is in
data/processed/, including old O1/O2/O3 data from previous runs
- Users can inadvertently test on multi-run data even with O4-only config
Problem
This caused confusion for a user who had O4-only specified in their config but was still testing on O1-O4 signals because old processed data remained in the directory from previous runs.
Their workaround was to manually delete old processed data and modify the downloader code.
Proposed Enhancement
Add optional run-filtering to the evaluation code that respects the config's runs parameter:
- Store observing run metadata in the manifest during download
- During evaluation, filter
signal_files by the runs specified in config
- Make this behavior optional (e.g.,
pipeline.filter_by_config_runs: true)
Workaround (Current)
Users must manually clear data/processed/ before downloading with a new run configuration:
Priority
Low - the workaround is simple and the project achieved its research goals. This would be a quality-of-life improvement for future users.
Description:
Current Behavior
The evaluation step in
run_clean_pipeline.pyloads all CWT files fromdata/processed/regardless of therunsparameter specified in the config file:The manifest is only used to determine
segment_type(signal vs noise), but doesn't filter by observing run. This means:signals.runs: [O4a, O4b]only affects downloadingdata/processed/, including old O1/O2/O3 data from previous runsProblem
This caused confusion for a user who had O4-only specified in their config but was still testing on O1-O4 signals because old processed data remained in the directory from previous runs.
Their workaround was to manually delete old processed data and modify the downloader code.
Proposed Enhancement
Add optional run-filtering to the evaluation code that respects the config's
runsparameter:signal_filesby the runs specified in configpipeline.filter_by_config_runs: true)Workaround (Current)
Users must manually clear
data/processed/before downloading with a new run configuration:rm -rf data/processed/*Priority
Low - the workaround is simple and the project achieved its research goals. This would be a quality-of-life improvement for future users.