Skip to content

Insufficient peaks in pseudoreplicates resulting in chip.idr_ppr failure #307

@nursyahr

Description

@nursyahr

Describe the bug

The pipeline fails at the peak merging step at chip.idr_ppr due to insufficient peaks. Upon checking the respective pseudoreplicate files, there are only 20 and 6 peaks respectively. To my understanding the pseudoreplicates are supposed to be a subsample of the true replicates, so I am unclear why there are so little peaks even before merging. I have tried to set a seed for the pseudoreplicates but still no luck. No such error encountered for other samples.

OS/Platform

  • OS/Platform: Linux 4.18.0-372.9.1.el8.x86_64
  • Pipeline version: v2.2.2
  • Caper version: v2.3.2

Caper configuration file

Paste contents of ~/.caper/default.conf.

backend=local

# Local directory for localized files and Cromwell's intermediate files.
# If not defined then Caper will make .caper_tmp/ on CWD or `local-out-dir`.
# /tmp is not recommended since Caper store localized data files here.
local-loc-dir=/home/cbi/grn_inference/database/raw/as_tf/ctcf/chipseq/chip-seq_encode_pipeline/pipeline_data

cromwell=/home/nursyahi001/.caper/cromwell_jar/cromwell-82.jar
womtool=/home/nursyahi001/.caper/womtool_jar/womtool-82.jar

Input JSON file

Paste contents of your input JSON file.

{
    "chip.title" : "2024-08-20 FFF Control",
    "chip.description" : "Samples: WHC2180-2; FFF-C1,C2,C3",

    "chip.pipeline_type" : "tf",
    "chip.aligner" : "bowtie2",
    "chip.align_only" : false,
    "chip.true_rep_only" : false,

    "chip.genome_tsv" : "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v4/hg38.tsv",
    "chip.genome_name" : "hg38",

    "chip.paired_end" : true,
    "chip.ctl_paired_end" : true,

    "chip.always_use_pooled_ctl" : true,

    "chip.fastqs_rep1_R1" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2180_1_val_1.fq.gz"],
    "chip.fastqs_rep2_R1" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2181_1_val_1.fq.gz"],
    "chip.fastqs_rep3_R1" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2182_1_val_1.fq.gz"],

    "chip.fastqs_rep1_R2" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2180_2_val_2.fq.gz"],
    "chip.fastqs_rep2_R2" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2181_2_val_2.fq.gz"],
    "chip.fastqs_rep3_R2" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2182_2_val_2.fq.gz"],

    "chip.ctl_fastqs_rep1_R1" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2192_1_val_1.fq.gz"],
    "chip.ctl_fastqs_rep1_R2" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2192_2_val_2.fq.gz"]
    
}

Troubleshooting result

If you ran caper run without Caper server then Caper automatically runs a troubleshooter for failed workflows. Find troubleshooting result in the bottom of Caper's screen log.

If you ran caper submit with a running Caper server then first find your workflow ID (1st column) with caper list and run caper debug [WORKFLOW_ID].

Paste troubleshooting result.



==== NAME=chip.idr_ppr, STATUS=Failed, PARENT=
SHARD_IDX=-1, RC=1, JOB_ID=2331552
START=2024-09-12T11:51:25.550Z, END=2024-09-12T11:51:39.781Z
STDOUT=/home/cbi/projects/20240725_PheckKhee_CHIPseq_data/croo_output/outputs/trimmed/chip/10977366-4d12-4516-b858-b2ecec8ef1d0/call-idr_ppr/attempt-2/execution/stdout
STDERR=/home/cbi/projects/20240725_PheckKhee_CHIPseq_data/croo_output/outputs/trimmed/chip/10977366-4d12-4516-b858-b2ecec8ef1d0/call-idr_ppr/attempt-2/execution/stderr
STDERR_CONTENTS=
Traceback (most recent call last):
  File "/software/chip-seq-pipeline/src/encode_task_idr.py", line 213, in <module>
    main()
  File "/software/chip-seq-pipeline/src/encode_task_idr.py", line 175, in main
    args.idr_thresh, args.idr_rank, args.mem_gb, args.out_dir,
  File "/software/chip-seq-pipeline/src/encode_task_idr.py", line 118, in idr
    idr_stdout=idr_stdout,
  File "/software/chip-seq-pipeline/src/encode_lib_common.py", line 359, in run_shell_cmd
    raise Exception(err_str)
Exception: PID=2331702, PGID=2331702, RC=1, DURATION_SEC=5.2
STDERR=Traceback (most recent call last):
  File "/usr/local/bin/idr", line 4, in <module>
    __import__('pkg_resources').run_script('idr==2.0.3', 'idr')
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 658, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1438, in run_script
    exec(code, namespace, namespace)
  File "/usr/local/lib/python3.6/dist-packages/idr-2.0.3-py3.6-linux-x86_64.egg/EGG-INFO/scripts/idr", line 10, in <module>
    idr.idr.main()
  File "/usr/local/lib/python3.6/dist-packages/idr-2.0.3-py3.6-linux-x86_64.egg/idr/idr.py", line 857, in main
    raise ValueError(error_msg)
ValueError: Peak files must contain at least 20 peaks post-merge
Hint: Merged peaks were written to the output file
STDOUT=/usr/local/bin/idr --samples /cromwell-executions/chip/10977366-4d12-4516-b858-b2ecec8ef1d0/call-idr_ppr/attempt-2/inputs/313145862/rep-pr1.pooled_x_WHC2192_1_val_1.srt.nodup.300K.regionPea
k.gz /cromwell-executions/chip/10977366-4d12-4516-b858-b2ecec8ef1d0/call-idr_ppr/attempt-2/inputs/305386503/rep-pr2.pooled_x_WHC2192_1_val_1.srt.nodup.300K.regionPeak.gz --peak-list /cromwell-exec
utions/chip/10977366-4d12-4516-b858-b2ecec8ef1d0/call-idr_ppr/attempt-2/inputs/1977487234/rep.pooled_x_WHC2192_1_val_1.srt.nodup.300K.regionPeak.gz --input-file-type narrowPeak --output-file poole
d-pr1_vs_pooled-pr2.idr0.05.unthresholded-peaks.txt --rank signal.value --soft-idr-threshold 0.05 --plot --use-best-multisummit-IDR --log-output-file pooled-pr1_vs_pooled-pr2.idr0.05.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions