@@ -18,6 +18,7 @@ Extract MS2 format files (compatible with MS-GF+, Comet, etc.):
1818``` bash
1919ms2-extractor /path/to/sample.d
2020ms2-extractor /path/to/sample.d --output custom_output.ms2 --min-intensity 100 --min-charge 2
21+ ms2-extractor /path/to/directory_with_multiple_d_folders --output /path/to/output_directory
2122```
2223
2324### MGF Extraction
@@ -26,16 +27,34 @@ Extract MGF format files (compatible with Mascot, MaxQuant, etc.):
2627``` bash
2728mgf-extractor /path/to/sample.d
2829mgf-extractor /path/to/sample.d --casanovo # Optimized for Casanovo de novo sequencing
30+ mgf-extractor /path/to/directory_with_multiple_d_folders --output /path/to/output_directory
2931```
3032
33+ ## Output Options
34+
35+ Both extractors support flexible output options:
36+
37+ 1 . ** No output specified** : Files are created within each .D folder with auto-generated names
38+ 2 . ** Specific file path** : Use ` -o filename.ms2 ` or ` -o filename.mgf ` for single .D folder processing
39+ 3 . ** Output directory** : Use ` -o /path/to/output_dir ` for batch processing multiple .D folders
40+ 4 . ** Overwrite protection** : Use ` --overwrite ` to replace existing output files
41+
42+ ### Batch Processing
43+
44+ When processing multiple .D folders, the extractors will:
45+ - Automatically find all .D folders in the specified directory
46+ - Create output files with names matching the .D folder names
47+ - Skip existing files unless ` --overwrite ` is specified
48+ - Create the output directory if it doesn't exist
49+
3150## Command Line Arguments
3251
3352### MS2 Extractor Arguments
3453
3554| Argument | Type | Default | Description |
3655| ----------| ------| ---------| -------------|
37- | ` analysis_dir ` | str | - | Path to the .D analysis directory |
38- | ` -o, --output ` | str | ` <analysis_dir_name>.ms2 ` | Output MS2 file path |
56+ | ` analysis_dir ` | str | - | Path to the .D analysis directory or directory containing .D folders |
57+ | ` -o, --output ` | str | ` <analysis_dir_name>.ms2 ` | Output MS2 file path or directory |
3958| ` --remove-precursor ` | flag | False | Remove precursor peaks from MS/MS spectra |
4059| ` --precursor-peak-width ` | float | 2.0 | Width around precursor m/z to remove (Da) |
4160| ` --batch-size ` | int | 100 | Batch size for processing spectra |
@@ -49,14 +68,15 @@ mgf-extractor /path/to/sample.d --casanovo # Optimized for Casanovo de novo seq
4968| ` --max-rt ` | float | None | Maximum retention time filter (seconds) |
5069| ` --min-ccs ` | float | None | Minimum CCS filter |
5170| ` --max-ccs ` | float | None | Maximum CCS filter |
71+ | ` --overwrite ` | flag | False | Overwrite existing output files |
5272| ` -v, --verbose ` | flag | False | Enable verbose logging |
5373
5474### MGF Extractor Arguments
5575
5676| Argument | Type | Default | Description |
5777| ----------| ------| ---------| -------------|
58- | ` analysis_dir ` | str | - | Path to the .D analysis directory |
59- | ` -o, --output ` | str | ` <analysis_dir_name>.mgf ` | Output MGF file path |
78+ | ` analysis_dir ` | str | - | Path to the .D analysis directory or directory containing .D folders |
79+ | ` -o, --output ` | str | ` <analysis_dir_name>.mgf ` | Output MGF file path or directory |
6080| ` --remove-precursor ` | flag | False | Remove precursor peaks from MS/MS spectra |
6181| ` --precursor-peak-width ` | float | 2.0 | Width around precursor m/z to remove (Da) |
6282| ` --batch-size ` | int | 100 | Batch size for processing spectra |
@@ -70,5 +90,25 @@ mgf-extractor /path/to/sample.d --casanovo # Optimized for Casanovo de novo seq
7090| ` --max-rt ` | float | None | Maximum retention time filter (seconds) |
7191| ` --min-ccs ` | float | None | Minimum CCS filter |
7292| ` --max-ccs ` | float | None | Maximum CCS filter |
93+ | ` --overwrite ` | flag | False | Overwrite existing output files |
7394| ` -v, --verbose ` | flag | False | Enable verbose logging |
7495| ` --casanovo ` | flag | False | Preset for Casanovo: enables precursor removal, top-150 peaks, min intensity 0.01, m/z 50-2500 |
96+
97+ ## Features
98+
99+ - ** Multiple format support** : Export to MS2 and MGF formats
100+ - ** Flexible output options** : Single files, batch processing, custom directories
101+ - ** Flexible filtering** : Filter by charge state, m/z range, retention time, CCS, and intensity
102+ - ** Batch processing** : Process multiple .D folders at once
103+ - ** Precursor removal** : Option to remove precursor peaks from spectra
104+ - ** Peak selection** : Keep only the most intense peaks per spectrum
105+ - ** DDA and PRM support** : Works with both Data-Dependent Acquisition and Parallel Reaction Monitoring data
106+ - ** Overwrite protection** : Prevents accidental file overwrites unless explicitly requested
107+
108+ ## Requirements
109+
110+ - Python ≥ 3.8
111+ - tdfpy ≥ 0.1.7
112+ - serenipy ≥ 0.2.6
113+ - tqdm
114+ - pandas
0 commit comments