You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(Assignment): Long read support with pbmm2 mapper (#247)
* chore(master): release 0.6.0
* Add long-read barcode assignment via pbmm2
* Add pbmm2 support and update QC report logic for long-read assignments
* Add pbmm2 and pysam support in Dockerfile with new conda environment
* Refactor pbmm2 rules to ensure consistent conda environment usage and streamline parameter definitions
* using linker instead of pattern
* Add pyproject.toml for snakefmt configuration and update Snakemake rules for improved logging and parameter handling
* updating docs
* Enable summary saving for Super Linter in GitHub Actions workflow
* snakefmt
* add strand_sensitive
* test
* Add required field for enable in alignment tool and adjust strand_sensitive requirement
* Set default value for strand_sensitive.enable to false in config schema
* fas
* fasdfsda
* fasdfasd
* fasf
* Add resource configuration for assignment_mapping_pbmm2_align
* Remove unused output generation for counts in all_experiments rule
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: BennyKrup <krupkinbenyamin@gmail.com>
* enhance trimming functionality and update config schema for adapter specifications ([798cebb](https://github.com/kircherlab/MPRAsnakeflow/commit/798cebb8ad24b6d2b38816522fe86072d1b0df04))
16
+
* experiment adapter trimming and option to do BC (also UMI if available) selection from end of the read (FWD only) ([#238](https://github.com/kircherlab/MPRAsnakeflow/issues/238)) ([04dd683](https://github.com/kircherlab/MPRAsnakeflow/commit/04dd6831d243bf22508b390b7c1926f744eb4759))
17
+
* fastq-join as option for merging reads (assignment workflow) ([#243](https://github.com/kircherlab/MPRAsnakeflow/issues/243)) ([093e288](https://github.com/kircherlab/MPRAsnakeflow/commit/093e288fd9a6f38df2be2f7e25ae18ecba0d3f7a))
18
+
* implement adapter trimming functionality in experiment rules ([9fd32ce](https://github.com/kircherlab/MPRAsnakeflow/commit/9fd32cee97dc69217f88f994249ea92ed0dd5b5e))
19
+
20
+
21
+
### Bug Fixes
22
+
23
+
* correct parameter name in check_version function ([7cd50a5](https://github.com/kircherlab/MPRAsnakeflow/commit/7cd50a52f3eaa8baae4d2d6937219b58c492d8d7))
24
+
* snakemake reverted default value handling ([#236](https://github.com/kircherlab/MPRAsnakeflow/issues/236)) ([fa5109b](https://github.com/kircherlab/MPRAsnakeflow/commit/fa5109baacc8252c9f407c9cc54e080ca72e32e4))
25
+
26
+
27
+
### Code Refactoring
28
+
29
+
* renaming output files to use dots instead as undersocres a file separators ([#239](https://github.com/kircherlab/MPRAsnakeflow/issues/239)) ([0546082](https://github.com/kircherlab/MPRAsnakeflow/commit/0546082a2edca83566dcb2283db61c423533524f))
Copy file name to clipboardExpand all lines: docs/1_getting_started/config.rst
+9-5Lines changed: 9 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,13 +45,13 @@ For each assignment you want to process, you must give it a name like :code:`exa
45
45
:split_number:
46
46
To parallelize mapping for assignment, the reads are split into :code:`split_number` files. For example, setting it to 300 means that the reads are split into 300 files, and each file is mapped in parallel. This is only useful when using a cluster. When running the workflow on a single machine, the default value should be used. The default is set to :code:`1`. (For technical reasons, when multiple assignments are defined, all will be set to the maximum defined in the config.)
47
47
:tool:
48
-
Alignment tool that is used. Currently, :code:`bbmap`, :code:`bwa`, :code:`bwa-additional-filtering`, and :code:`exact` are supported. Default is :code:`bbmap`.
48
+
Alignment tool that is used. Currently, :code:`bbmap`, :code:`bwa`, :code:`bwa-additional-filtering`, :code:`exact`, and :code:`pbmm2` are supported. Default is :code:`bbmap`.
49
49
:configs:
50
50
Configurations of the alignment tool selected.
51
51
52
-
:sequence_length (exact, bbmap):
52
+
:sequence_length (exact, bbmap, pbmm2):
53
53
Defines the :code:`sequence_length`, which is the length of a sequence alignment to an oligo in the design file. Only one length design is supported.
54
-
:alignment_start (exact, bbmap):
54
+
:alignment_start (exact, bbmap, pbmm2):
55
55
Defines the start of the alignment in an oligo. When using adapters, you must set the length of the adapter. Otherwise, 1 will be the choice for most cases.
56
56
:sequence_length (bwa, bwa-additional-filtering):
57
57
Defines the :code:`min` and :code:`max` of a :code:`sequence_length` specification. :code:`sequence_length` is the length of a sequence alignment to an oligo in the design file. Because there can be insertions and deletions, we recommend varying it slightly around the exact length (e.g., ±5). This option enables designs with multiple sequence lengths.
@@ -69,15 +69,19 @@ For each assignment you want to process, you must give it a name like :code:`exa
69
69
(Optional) Threshold of mismatches we investigate if we should try to rescue. Default is :code:`3`.
70
70
:verbose (bwa-additional-filtering):
71
71
(Optional) Print which alignments were rescued and which could not be rescued. Default is :code:`false`.
72
+
:preset (pbmm2):
73
+
(Optional) Preset for pbmm2 alignment. Default is :code:`SUBREAD`.
74
+
:min_concordance (pbmm2):
75
+
(Optional) Minimum concordance for pbmm2 alignment. Default is :code:`0.9`.
72
76
73
77
:bc_length:
74
78
Length of the barcode. Must match the length of :code:`BC`.
75
79
:BC_rev_comp:
76
80
(Optional) If set to :code:`true`, the barcode is reverse complemented. Default is :code:`false`.
77
81
:linker_length:
78
-
(Optional) Length of the linker. Only needed if you don't have a barcode read and the barcode is in the forward read with the structure: BC+Linker+Insert. The fixed length is used for the linker after a fixed length of BC. The recommended option is :code:`linker` by defining the exact linker sequence and using cutadapt for trimming.
82
+
(Optional) Length of the linker. O nly needed if you don't have a barcode read and the barcode is in the forward read with the structure: BC+Linker+Insert. The fixed length is used for the linker after a fixed length of BC. The recommended option is :code:`linker` by defining the exact linker sequence and using cutadapt for trimming.
79
83
:linker:
80
-
(Optional) Length of the linker. Only needed if you don't have a barcode read and the barcode is in the forward read with the structure: BC+Linker+Insert. Uses cutadapt to trim the linker to get the barcode as well as the start of the insert.
84
+
(Required for long read, otherwise optional) The exact linker between BC and oligo. *Short read data:* Only needed if you don't have a barcode read and the barcode is in the forward read with the structure: BC+Linker+Insert. Uses cutadapt to trim the linker to get the barcode as well as the start of the insert. *Long read data:* Required! BC will be taken after the linker.
81
85
:FWD:
82
86
List of forward-read files in gzipped fastq format. The full or relative path to the files should be used. The same order in FWD, BC, and REV is important.
If you want to use the strand sensitivity option (e.g., testing enhancers in both directions), you can add the following to the config file: :code:`strand_sensitive: {enable: true}`. Otherwise, MPRAsnakeflow will give you an error because it cannot handle the same sequences in both sense and antisense directions. This is an issue with the mappers because they do not consider the strand and will always call your read ambiguous due to multiple matches.
67
+
Example of an assignment file using long read data with pbmm2 mapping:
If you want to use the strand sensitivity option (e.g., testing enhancers in both directions), you can add the following to the config file: :code:`strand_sensitive: {enable: true}`. Otherwise, MPRAsnakeflow will give you an error because it cannot handle the same sequences in both sense and antisense directions. This is an issue with the mappers because they do not consider the strand and will always call your read ambiguous due to multiple matches. **Not available for long read data.**
68
73
69
74
Snakemake
70
75
============================
@@ -118,6 +123,9 @@ Rules run by Snakemake in the assignment utility:
118
123
- **assignment_mapping_bwa_ref**: Create mapping reference for BWA from design file.
119
124
- **assignment_mapping_exact**: Map the reads to the reference and sort using exact match.
120
125
- **assignment_mapping_exact_reference**: Create reference to map the exact design
126
+
- **assignment_mapping_pbmm2_align**: Align long reads (BAM or FASTA) to reference using pbmm2.
127
+
- **assignment_mapping_pbmm2_getBCs**: Extract barcodes from aligned long reads. Produces the standard barcode TSV for downstream collection and filtering.
128
+
- **assignment_mapping_pbmm2_index**: Create pbmm2 index from design reference.
121
129
- **assignment_merge_NGmerge**: Merge the FWD, REV and BC fastq files into one using NGmerge.
122
130
- **assignment_merge_fastqjoin**: Merge the FWD, REV and BC fastq files into one using fastq-join.
123
131
- **assignment_preprocessing_adapter_remove**: Remove adapter sequence from the reads (3' or 5'). Uses cutadapt to trim adapters based on the primer direction.
0 commit comments