Skip to content

Commit 6f1e00a

Browse files
committed
Merge branch 'CW-2769' into 'dev'
CW-2769 remove denovo Closes CW-2769 See merge request epi2melabs/workflows/wf-transcriptomes!132
2 parents 5d16332 + f683d44 commit 6f1e00a

14 files changed

Lines changed: 36 additions & 598 deletions

File tree

.gitlab-ci.yml

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ docker-run:
4545
- MATRIX_NAME: [
4646
"fusions", "differential_expression", "isoforms",
4747
"only_differential_expression", "differential_expression_gff3",
48-
"ncbi_gzip", "denovo", "ncbi_no_gene_id", "ensembl_with_versions",
48+
"ncbi_gzip", "ncbi_no_gene_id", "ensembl_with_versions",
4949
"differential_expression_mouse"
5050
]
5151
rules:
@@ -60,10 +60,6 @@ docker-run:
6060
NF_WORKFLOW_OPTS: "--fastq ERR6053095_chr20.fastq --transcriptome-source reference-guided \
6161
--ref_genome chr20/hg38_chr20.fa --ref_annotation chr20/gencode.v22.annotation.chr20.gtf"
6262
NF_IGNORE_PROCESSES: preprocess_reads,merge_transcriptomes,decompress_annotation,decompress_ref,decompress_transcriptome,preprocess_ref_transcriptome
63-
- if: $MATRIX_NAME == "denovo"
64-
variables:
65-
NF_WORKFLOW_OPTS: "--fastq test_data/fastq/SIRV_E0_PCS109_50.fq.gz --transcriptome_source denovo"
66-
NF_IGNORE_PROCESSES: preprocess_reads,merge_transcriptomes,decompress_annotation,decompress_ref,build_minimap_index,decompress_transcriptome,preprocess_ref_transcriptome
6763
- if: $MATRIX_NAME == "fusions"
6864
variables:
6965
NF_BEFORE_SCRIPT: wget -O test_data.tar.gz https://ont-exd-int-s3-euwst1-epi2me-labs.s3.amazonaws.com/wf-isoforms/wf-isoforms_test_data.tar.gz && tar -xzvf test_data.tar.gz

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
55
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
66

77
## [unreleased]
8+
### Fixed
89
- Remove dead links from README
10+
### Removed
11+
- Denovo `--transcriptome_source` option.`
912

1013
## [v0.3.1]
1114
### Added

README.md

Lines changed: 2 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,6 @@ for assembly and annotation of transcripts from Oxford Nanopore cDNA or direct R
55

66

77

8-
9-
108
## Introduction
119

1210
This workflow identifies RNA isoforms using either cDNA or direct RNA (dRNA)
@@ -27,16 +25,6 @@ in long read mode (with or without a guide reference annotation) to generate the
2725
* The annotation generated by the pipeline is compared to the reference annotation.
2826
using [gffcompare](http://ccb.jhu.edu/software/stringtie/gffcompare.shtml)
2927

30-
#### de novo-based transcript assembly (experimental!)
31-
* Sequence clusters are generated using [isONclust2](https://github.com/nanoporetech/isONclust2)
32-
* If a reference genome is supplied, cluster quality metrics are determined by comparing
33-
with clusters generated from a minimap2 alignment.
34-
* A consensus sequence for each cluster is generated using [spoa](https://github.com/rvaser/spoa)
35-
* Three rounds of polishing using racon and minimap2 to give a final polished CDS for each gene.
36-
* Full-length reads are then mapped to these polished CDS.
37-
* Transcripts are assembled by stringtie as for the reference-based approach.
38-
* __Note__: This approach is currently not supported with direct RNA reads.
39-
4028
### Fusion gene detection
4129
Fusion gene detection is performed using [JAFFA](https://github.com/Oshlack/JAFFA), with the JAFFAL extension for use
4230
with ONT long reads.
@@ -134,25 +122,14 @@ nextflow run epi2me-labs/wf-transcriptomes \
134122
--out_dir outdir -w workspace_dir
135123
```
136124

137-
**Example workflow for denovo transcript assembly**
138-
```
139-
OUTPUT=~/output
140-
nextflow run epi2me-labs/wf-transcriptomes \
141-
--fastq test_data/fastq \
142-
--transcriptome_source denovo \
143-
--out_dir ${OUTPUT} \
144-
-w ${OUTPUT}/workspace \
145-
--sample sample_id
146-
```
147125
A full list of options can be seen in nextflow_schema.json.
148126
Parameters can be specified either in a config like `parameter = value` or on the command line like `--parameter value`.
149127
Below are some commonly used parameters in the format used in config files.
150128

151129
Select how the transcriptome used for analysis should be prepared:
152130

153131
- To create a reference transcriptome using an existing reference genome `--transcriptome_source reference-guided` (default)
154-
- Use a a supplied transcriptome `--transcriptome_source precomputed"`
155-
- Gnerate transcriptome via the denovo pipeline `--transcriptome_source denovo"`
132+
- Use a supplied transcriptome `--transcriptome_source precomputed"`
156133

157134

158135
To run the workflow with direct RNA reads `--direct_rna true` (this just skips the pychopper step).
@@ -297,7 +274,4 @@ in `${out_dir}/jaffal_output_${sample_id}` you will find:
297274
* [nextflow](https://www.nextflow.io/)
298275
* [docker](https://www.docker.com/products/docker-desktop)
299276
* [Singularity](https://sylabs.io/singularity/)
300-
* [racon](https://github.com/isovic/racon)
301-
* [spoa](https://github.com/rvaser/spoa)
302-
* [inONclust](https://github.com/ksahlin/isONclust)
303-
* [isONclust2](https://github.com/nanoporetech/isONclust2)
277+
* [racon](https://github.com/isovic/racon)

bin/workflow_glue/report.py

Lines changed: 10 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,6 @@ def argparser():
7474
parser.add_argument(
7575
"--de_stats", required=False, type=str, default=None, nargs='*',
7676
help="Differential expression report optional")
77-
parser.add_argument('--denovo', dest='denovo', action='store_true')
7877

7978
return parser
8079

@@ -699,21 +698,16 @@ def transcript_table(report, df_tmaps, max_rows):
699698
section.table(df, index=False)
700699

701700

702-
def transcriptome_summary(report, gffs, sample_ids, denovo=False):
701+
def transcriptome_summary(report, gffs, sample_ids):
703702
"""
704703
Plot transcriptome summaries.
705704
706705
Some of this data is available via gffcompare output, but the de novo
707706
pipeline skips that, so we do it all here.
708707
709-
We do not report exon number for the denovo assembly yet. This is because
710-
in this case, the gff annotation is generated by aligning to the CDS not
711-
the genome.
712-
713708
:param report: aplanat WFReport
714709
:param gffs: list of paths to gff transcriptome annotations
715710
:param sample_ids: list of sample ids
716-
:param denovo: whether annotation was generated by de novo pipeline or not
717711
"""
718712
# test.db gets written to the git repo.
719713
section = report.add_section()
@@ -771,17 +765,16 @@ def transcriptome_summary(report, gffs, sample_ids, denovo=False):
771765
title='transcript lengths')
772766
plots.append(box)
773767

774-
if not denovo:
775-
x, y = zip(*sorted(exons_per_transcript.items()))
768+
x, y = zip(*sorted(exons_per_transcript.items()))
776769

777-
fig = figure(title="Exons per transcript")
778-
fig.vbar(
779-
x, top=list(y), color=Colors.cerulean)
780-
fig.xaxis.axis_label = 'Num. exons'
781-
fig.yaxis.axis_label = 'Num. genes'
770+
fig = figure(title="Exons per transcript")
771+
fig.vbar(
772+
x, top=list(y), color=Colors.cerulean)
773+
fig.xaxis.axis_label = 'Num. exons'
774+
fig.yaxis.axis_label = 'Num. genes'
782775

783-
fig.xaxis.major_label_orientation = math.pi / 2.8
784-
plots.append(fig)
776+
fig.xaxis.major_label_orientation = math.pi / 2.8
777+
plots.append(fig)
785778

786779
df_sum = pd.DataFrame.from_dict(
787780
{'Total genes': [num_genes],
@@ -929,7 +922,7 @@ def main(args):
929922
# Results
930923
if args.gff_annotation is not None:
931924
transcriptome_summary(
932-
report, args.gff_annotation, sample_ids, denovo=args.denovo)
925+
report, args.gff_annotation, sample_ids)
933926

934927
if args.gffcompare_dir is not None:
935928
df_tmaps = gff_compare_plots(

bin/workflow_glue/run_isonclust2.py

Lines changed: 0 additions & 138 deletions
This file was deleted.

docs/header.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
# wf-transcriptomes
22

33
This repository contains a [nextflow](https://www.nextflow.io/) workflow
4-
for assembly and annotation of transcripts from Oxford Nanopore cDNA or direct RNA reads.
5-
4+
for assembly and annotation of transcripts from Oxford Nanopore cDNA or direct RNA reads.

docs/intro.md

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -18,16 +18,6 @@ in long read mode (with or without a guide reference annotation) to generate the
1818
* The annotation generated by the pipeline is compared to the reference annotation.
1919
using [gffcompare](http://ccb.jhu.edu/software/stringtie/gffcompare.shtml)
2020

21-
#### de novo-based transcript assembly (experimental!)
22-
* Sequence clusters are generated using [isONclust2](https://github.com/nanoporetech/isONclust2)
23-
* If a reference genome is supplied, cluster quality metrics are determined by comparing
24-
with clusters generated from a minimap2 alignment.
25-
* A consensus sequence for each cluster is generated using [spoa](https://github.com/rvaser/spoa)
26-
* Three rounds of polishing using racon and minimap2 to give a final polished CDS for each gene.
27-
* Full-length reads are then mapped to these polished CDS.
28-
* Transcripts are assembled by stringtie as for the reference-based approach.
29-
* __Note__: This approach is currently not supported with direct RNA reads.
30-
3121
### Fusion gene detection
3222
Fusion gene detection is performed using [JAFFA](https://github.com/Oshlack/JAFFA), with the JAFFAL extension for use
3323
with ONT long reads.

docs/links.md

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,4 @@
33
* [nextflow](https://www.nextflow.io/)
44
* [docker](https://www.docker.com/products/docker-desktop)
55
* [Singularity](https://sylabs.io/singularity/)
6-
* [racon](https://github.com/isovic/racon)
7-
* [spoa](https://github.com/rvaser/spoa)
8-
* [inONclust](https://github.com/ksahlin/isONclust)
9-
* [isONclust2](https://github.com/nanoporetech/isONclust2)
6+
* [racon](https://github.com/isovic/racon)

docs/quickstart.md

Lines changed: 1 addition & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -47,25 +47,14 @@ nextflow run epi2me-labs/wf-transcriptomes \
4747
--out_dir outdir -w workspace_dir
4848
```
4949

50-
**Example workflow for denovo transcript assembly**
51-
```
52-
OUTPUT=~/output
53-
nextflow run epi2me-labs/wf-transcriptomes \
54-
--fastq test_data/fastq \
55-
--transcriptome_source denovo \
56-
--out_dir ${OUTPUT} \
57-
-w ${OUTPUT}/workspace \
58-
--sample sample_id
59-
```
6050
A full list of options can be seen in nextflow_schema.json.
6151
Parameters can be specified either in a config like `parameter = value` or on the command line like `--parameter value`.
6252
Below are some commonly used parameters in the format used in config files.
6353

6454
Select how the transcriptome used for analysis should be prepared:
6555

6656
- To create a reference transcriptome using an existing reference genome `--transcriptome_source reference-guided` (default)
67-
- Use a a supplied transcriptome `--transcriptome_source precomputed"`
68-
- Gnerate transcriptome via the denovo pipeline `--transcriptome_source denovo"`
57+
- Use a supplied transcriptome `--transcriptome_source precomputed"`
6958

7059

7160
To run the workflow with direct RNA reads `--direct_rna true` (this just skips the pychopper step).

evaluation/tests.sh

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -26,18 +26,6 @@ multisampledir="test_data/demultiplexed_fastq"
2626
#"--minimap2_opts '-uf --splice-flank=no'"
2727
results=()
2828

29-
OUTPUT=$1/denovo_multi_sample_no_ref_genome;
30-
nextflow run . --fastq $multisampledir $config --denovo --ref_genome test_data/SIRV_150601a.fasta -profile local --out_dir ${OUTPUT} -w ${OUTPUT}/workspace \
31-
--sample_sheet test_data/sample_sheet -resume;
32-
r=$?
33-
results+=("$(basename $OUTPUT): $r")
34-
35-
OUTPUT=$1/denovo_single;
36-
nextflow run . --fastq $singledir $config --denovo --ref_genome test_data/SIRV_150601a.fasta -profile local --out_dir ${OUTPUT} -w ${OUTPUT}/workspace \
37-
--sample_sheet test_data/sample_sheet -resume;
38-
r=$?
39-
results+=("$(basename $OUTPUT): $r")
40-
4129
# Reference based tests
4230
OUTPUT=$1/reference_single_dir;
4331
nextflow run . --fastq $singledir $config --ref_genome test_data/SIRV_150601a.fasta --minimap2_opts '-uf --splice-flank=no' \

0 commit comments

Comments
 (0)