Skip to content

Commit e50c65d

Browse files
readme
1 parent 16ba4cf commit e50c65d

2 files changed

Lines changed: 45 additions & 5 deletions

File tree

README.md

Lines changed: 44 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ Extract MS2 format files (compatible with MS-GF+, Comet, etc.):
1818
```bash
1919
ms2-extractor /path/to/sample.d
2020
ms2-extractor /path/to/sample.d --output custom_output.ms2 --min-intensity 100 --min-charge 2
21+
ms2-extractor /path/to/directory_with_multiple_d_folders --output /path/to/output_directory
2122
```
2223

2324
### MGF Extraction
@@ -26,16 +27,34 @@ Extract MGF format files (compatible with Mascot, MaxQuant, etc.):
2627
```bash
2728
mgf-extractor /path/to/sample.d
2829
mgf-extractor /path/to/sample.d --casanovo # Optimized for Casanovo de novo sequencing
30+
mgf-extractor /path/to/directory_with_multiple_d_folders --output /path/to/output_directory
2931
```
3032

33+
## Output Options
34+
35+
Both extractors support flexible output options:
36+
37+
1. **No output specified**: Files are created within each .D folder with auto-generated names
38+
2. **Specific file path**: Use `-o filename.ms2` or `-o filename.mgf` for single .D folder processing
39+
3. **Output directory**: Use `-o /path/to/output_dir` for batch processing multiple .D folders
40+
4. **Overwrite protection**: Use `--overwrite` to replace existing output files
41+
42+
### Batch Processing
43+
44+
When processing multiple .D folders, the extractors will:
45+
- Automatically find all .D folders in the specified directory
46+
- Create output files with names matching the .D folder names
47+
- Skip existing files unless `--overwrite` is specified
48+
- Create the output directory if it doesn't exist
49+
3150
## Command Line Arguments
3251

3352
### MS2 Extractor Arguments
3453

3554
| Argument | Type | Default | Description |
3655
|----------|------|---------|-------------|
37-
| `analysis_dir` | str | - | Path to the .D analysis directory |
38-
| `-o, --output` | str | `<analysis_dir_name>.ms2` | Output MS2 file path |
56+
| `analysis_dir` | str | - | Path to the .D analysis directory or directory containing .D folders |
57+
| `-o, --output` | str | `<analysis_dir_name>.ms2` | Output MS2 file path or directory |
3958
| `--remove-precursor` | flag | False | Remove precursor peaks from MS/MS spectra |
4059
| `--precursor-peak-width` | float | 2.0 | Width around precursor m/z to remove (Da) |
4160
| `--batch-size` | int | 100 | Batch size for processing spectra |
@@ -49,14 +68,15 @@ mgf-extractor /path/to/sample.d --casanovo # Optimized for Casanovo de novo seq
4968
| `--max-rt` | float | None | Maximum retention time filter (seconds) |
5069
| `--min-ccs` | float | None | Minimum CCS filter |
5170
| `--max-ccs` | float | None | Maximum CCS filter |
71+
| `--overwrite` | flag | False | Overwrite existing output files |
5272
| `-v, --verbose` | flag | False | Enable verbose logging |
5373

5474
### MGF Extractor Arguments
5575

5676
| Argument | Type | Default | Description |
5777
|----------|------|---------|-------------|
58-
| `analysis_dir` | str | - | Path to the .D analysis directory |
59-
| `-o, --output` | str | `<analysis_dir_name>.mgf` | Output MGF file path |
78+
| `analysis_dir` | str | - | Path to the .D analysis directory or directory containing .D folders |
79+
| `-o, --output` | str | `<analysis_dir_name>.mgf` | Output MGF file path or directory |
6080
| `--remove-precursor` | flag | False | Remove precursor peaks from MS/MS spectra |
6181
| `--precursor-peak-width` | float | 2.0 | Width around precursor m/z to remove (Da) |
6282
| `--batch-size` | int | 100 | Batch size for processing spectra |
@@ -70,5 +90,25 @@ mgf-extractor /path/to/sample.d --casanovo # Optimized for Casanovo de novo seq
7090
| `--max-rt` | float | None | Maximum retention time filter (seconds) |
7191
| `--min-ccs` | float | None | Minimum CCS filter |
7292
| `--max-ccs` | float | None | Maximum CCS filter |
93+
| `--overwrite` | flag | False | Overwrite existing output files |
7394
| `-v, --verbose` | flag | False | Enable verbose logging |
7495
| `--casanovo` | flag | False | Preset for Casanovo: enables precursor removal, top-150 peaks, min intensity 0.01, m/z 50-2500 |
96+
97+
## Features
98+
99+
- **Multiple format support**: Export to MS2 and MGF formats
100+
- **Flexible output options**: Single files, batch processing, custom directories
101+
- **Flexible filtering**: Filter by charge state, m/z range, retention time, CCS, and intensity
102+
- **Batch processing**: Process multiple .D folders at once
103+
- **Precursor removal**: Option to remove precursor peaks from spectra
104+
- **Peak selection**: Keep only the most intense peaks per spectrum
105+
- **DDA and PRM support**: Works with both Data-Dependent Acquisition and Parallel Reaction Monitoring data
106+
- **Overwrite protection**: Prevents accidental file overwrites unless explicitly requested
107+
108+
## Requirements
109+
110+
- Python ≥ 3.8
111+
- tdfpy ≥ 0.1.7
112+
- serenipy ≥ 0.2.6
113+
- tqdm
114+
- pandas

src/tdfextractor/ms2_extractor.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -472,7 +472,7 @@ def main():
472472

473473
parser.add_argument(
474474
"--overwrite",
475-
action="store_true",
475+
action="store_true",
476476
help="Overwrite existing output file if it exists",
477477
)
478478

0 commit comments

Comments
 (0)