Describe the bug
The pipeline fails at the peak merging step at chip.idr_ppr due to insufficient peaks. Upon checking the respective pseudoreplicate files, there are only 20 and 6 peaks respectively. To my understanding the pseudoreplicates are supposed to be a subsample of the true replicates, so I am unclear why there are so little peaks even before merging. I have tried to set a seed for the pseudoreplicates but still no luck. No such error encountered for other samples.
OS/Platform
- OS/Platform: Linux 4.18.0-372.9.1.el8.x86_64
- Pipeline version: v2.2.2
- Caper version: v2.3.2
Caper configuration file
Paste contents of ~/.caper/default.conf.
backend=local
# Local directory for localized files and Cromwell's intermediate files.
# If not defined then Caper will make .caper_tmp/ on CWD or `local-out-dir`.
# /tmp is not recommended since Caper store localized data files here.
local-loc-dir=/home/cbi/grn_inference/database/raw/as_tf/ctcf/chipseq/chip-seq_encode_pipeline/pipeline_data
cromwell=/home/nursyahi001/.caper/cromwell_jar/cromwell-82.jar
womtool=/home/nursyahi001/.caper/womtool_jar/womtool-82.jar
Input JSON file
Paste contents of your input JSON file.
{
"chip.title" : "2024-08-20 FFF Control",
"chip.description" : "Samples: WHC2180-2; FFF-C1,C2,C3",
"chip.pipeline_type" : "tf",
"chip.aligner" : "bowtie2",
"chip.align_only" : false,
"chip.true_rep_only" : false,
"chip.genome_tsv" : "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v4/hg38.tsv",
"chip.genome_name" : "hg38",
"chip.paired_end" : true,
"chip.ctl_paired_end" : true,
"chip.always_use_pooled_ctl" : true,
"chip.fastqs_rep1_R1" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2180_1_val_1.fq.gz"],
"chip.fastqs_rep2_R1" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2181_1_val_1.fq.gz"],
"chip.fastqs_rep3_R1" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2182_1_val_1.fq.gz"],
"chip.fastqs_rep1_R2" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2180_2_val_2.fq.gz"],
"chip.fastqs_rep2_R2" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2181_2_val_2.fq.gz"],
"chip.fastqs_rep3_R2" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2182_2_val_2.fq.gz"],
"chip.ctl_fastqs_rep1_R1" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2192_1_val_1.fq.gz"],
"chip.ctl_fastqs_rep1_R2" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2192_2_val_2.fq.gz"]
}
Troubleshooting result
If you ran caper run without Caper server then Caper automatically runs a troubleshooter for failed workflows. Find troubleshooting result in the bottom of Caper's screen log.
If you ran caper submit with a running Caper server then first find your workflow ID (1st column) with caper list and run caper debug [WORKFLOW_ID].
Paste troubleshooting result.
==== NAME=chip.idr_ppr, STATUS=Failed, PARENT=
SHARD_IDX=-1, RC=1, JOB_ID=2331552
START=2024-09-12T11:51:25.550Z, END=2024-09-12T11:51:39.781Z
STDOUT=/home/cbi/projects/20240725_PheckKhee_CHIPseq_data/croo_output/outputs/trimmed/chip/10977366-4d12-4516-b858-b2ecec8ef1d0/call-idr_ppr/attempt-2/execution/stdout
STDERR=/home/cbi/projects/20240725_PheckKhee_CHIPseq_data/croo_output/outputs/trimmed/chip/10977366-4d12-4516-b858-b2ecec8ef1d0/call-idr_ppr/attempt-2/execution/stderr
STDERR_CONTENTS=
Traceback (most recent call last):
File "/software/chip-seq-pipeline/src/encode_task_idr.py", line 213, in <module>
main()
File "/software/chip-seq-pipeline/src/encode_task_idr.py", line 175, in main
args.idr_thresh, args.idr_rank, args.mem_gb, args.out_dir,
File "/software/chip-seq-pipeline/src/encode_task_idr.py", line 118, in idr
idr_stdout=idr_stdout,
File "/software/chip-seq-pipeline/src/encode_lib_common.py", line 359, in run_shell_cmd
raise Exception(err_str)
Exception: PID=2331702, PGID=2331702, RC=1, DURATION_SEC=5.2
STDERR=Traceback (most recent call last):
File "/usr/local/bin/idr", line 4, in <module>
__import__('pkg_resources').run_script('idr==2.0.3', 'idr')
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 658, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1438, in run_script
exec(code, namespace, namespace)
File "/usr/local/lib/python3.6/dist-packages/idr-2.0.3-py3.6-linux-x86_64.egg/EGG-INFO/scripts/idr", line 10, in <module>
idr.idr.main()
File "/usr/local/lib/python3.6/dist-packages/idr-2.0.3-py3.6-linux-x86_64.egg/idr/idr.py", line 857, in main
raise ValueError(error_msg)
ValueError: Peak files must contain at least 20 peaks post-merge
Hint: Merged peaks were written to the output file
STDOUT=/usr/local/bin/idr --samples /cromwell-executions/chip/10977366-4d12-4516-b858-b2ecec8ef1d0/call-idr_ppr/attempt-2/inputs/313145862/rep-pr1.pooled_x_WHC2192_1_val_1.srt.nodup.300K.regionPea
k.gz /cromwell-executions/chip/10977366-4d12-4516-b858-b2ecec8ef1d0/call-idr_ppr/attempt-2/inputs/305386503/rep-pr2.pooled_x_WHC2192_1_val_1.srt.nodup.300K.regionPeak.gz --peak-list /cromwell-exec
utions/chip/10977366-4d12-4516-b858-b2ecec8ef1d0/call-idr_ppr/attempt-2/inputs/1977487234/rep.pooled_x_WHC2192_1_val_1.srt.nodup.300K.regionPeak.gz --input-file-type narrowPeak --output-file poole
d-pr1_vs_pooled-pr2.idr0.05.unthresholded-peaks.txt --rank signal.value --soft-idr-threshold 0.05 --plot --use-best-multisummit-IDR --log-output-file pooled-pr1_vs_pooled-pr2.idr0.05.log
Describe the bug
The pipeline fails at the peak merging step at chip.idr_ppr due to insufficient peaks. Upon checking the respective pseudoreplicate files, there are only 20 and 6 peaks respectively. To my understanding the pseudoreplicates are supposed to be a subsample of the true replicates, so I am unclear why there are so little peaks even before merging. I have tried to set a seed for the pseudoreplicates but still no luck. No such error encountered for other samples.
OS/Platform
Caper configuration file
Paste contents of
~/.caper/default.conf.Input JSON file
Paste contents of your input JSON file.
{ "chip.title" : "2024-08-20 FFF Control", "chip.description" : "Samples: WHC2180-2; FFF-C1,C2,C3", "chip.pipeline_type" : "tf", "chip.aligner" : "bowtie2", "chip.align_only" : false, "chip.true_rep_only" : false, "chip.genome_tsv" : "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v4/hg38.tsv", "chip.genome_name" : "hg38", "chip.paired_end" : true, "chip.ctl_paired_end" : true, "chip.always_use_pooled_ctl" : true, "chip.fastqs_rep1_R1" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2180_1_val_1.fq.gz"], "chip.fastqs_rep2_R1" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2181_1_val_1.fq.gz"], "chip.fastqs_rep3_R1" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2182_1_val_1.fq.gz"], "chip.fastqs_rep1_R2" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2180_2_val_2.fq.gz"], "chip.fastqs_rep2_R2" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2181_2_val_2.fq.gz"], "chip.fastqs_rep3_R2" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2182_2_val_2.fq.gz"], "chip.ctl_fastqs_rep1_R1" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2192_1_val_1.fq.gz"], "chip.ctl_fastqs_rep1_R2" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2192_2_val_2.fq.gz"] }Troubleshooting result
If you ran
caper runwithout Caper server then Caper automatically runs a troubleshooter for failed workflows. Find troubleshooting result in the bottom of Caper's screen log.If you ran
caper submitwith a running Caper server then first find your workflow ID (1st column) withcaper listand runcaper debug [WORKFLOW_ID].Paste troubleshooting result.