Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File latency issue? #74

Open
molecules opened this issue Oct 4, 2023 · 0 comments
Open

File latency issue? #74

molecules opened this issue Oct 4, 2023 · 0 comments

Comments

@molecules
Copy link

Describe the bug
The pipeline creates a file that it doesn't think exists. In the example below, one case of this would be /redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/inputs/667055062/indexes.tar.gz, which does exist when I check after the fact.

This is running on a compute cluster. I've repeated this multiple times, so it's not just a one-off error.

My first guess it that this is due to file system latency. If this were Snakemake, I'd know that I just needed to increase the value passed to --latency-wait, but I'm not sure how to set something like that for this pipeline.

OS/Platform

  • OS/Platform: Ubuntu 20.04.6 LTS
  • Pipeline version: 1.1.7
  • Caper version: 2.3.2

Caper configuration file

backend=slurm

# SLURM partition. DEFINE ONLY IF REQUIRED BY YOUR CLUSTER'S POLICY.
# You must define it for Stanford Sherlock.
slurm-partition=serial

# SLURM account. DEFINE ONLY IF REQUIRED BY YOUR CLUSTER'S POLICY.
# You must define it for Stanford SCG.
slurm-account=

# Local directory for localized files and Cromwell's intermediate files.
# If not defined then Caper will make .caper_tmp/ on CWD or `local-out-dir`.
# /tmp is not recommended since Caper store localized data files here.
local-loc-dir=

cromwell=/redacted/caper/2.3.2/jars/cromwell-82.jar
womtool=/redacted/caper/2.3.2/jars/womtool-82.jar

Input JSON file

{
  "wgbs.benchmark_mode": true,
  "wgbs.extra_reference": "/redacted2/encode-wgbs/wgbs-pipeline/tests/data/conversion_control.fa.gz",
  "wgbs.fastqs": [
    [
      [
        "/redacted2/encode-wgbs/wgbs-pipeline/tests/data/sample5_data_1_200000.fastq.gz",
        "/redacted2/encode-wgbs/wgbs-pipeline/tests/data/sample5_data_2_200000.fastq.gz"
      ]
    ]
  ],
  "wgbs.reference": "/redacted2/encode-wgbs/wgbs-pipeline/tests/data/sacCer3.fa.gz",
  "wgbs.sample_names": [
    "sample5"
  ],
  "wgbs.underconversion_sequence_name": "NC_001416.1"
}

Error log

2023-10-04 21:10:58,310|caper.cli|INFO| Cromwell stdout: /redacted/wgbs_test/cromwell.out.1
2023-10-04 21:10:58,315|caper.caper_base|INFO| Creating a timestamped temporary directory. /redacted/wgbs_test/.caper_tmp/wgbs-pipeline/20231004_211058_313834
2023-10-04 21:10:58,315|caper.caper_runner|INFO| Localizing files on work_dir. /redacted/wgbs_test/.caper_tmp/wgbs-pipeline/20231004_211058_313834
2023-10-04 21:10:58,686|caper.caper_workflow_opts|INFO| Singularity image found in WDL metadata. wdl=/redacted2/encode-wgbs/1.1.8/wgbs-pipeline.wdl, s=docker:https://encodedcc/wgbs-pipeline:1.1.7
2023-10-04 21:10:58,706|caper.cromwell|INFO| Validating WDL/inputs/imports with Womtool...
2023-10-04 21:11:04,509|caper.nb_subproc_thread|INFO| Subprocess finished successfully.
2023-10-04 21:11:04,510|caper.cromwell|INFO| Passed Womtool validation.            
2023-10-04 21:11:04,510|caper.caper_runner|INFO| launching run: wdl=/redacted2/encode-wgbs/1.1.8/wgbs-pipeline.wdl, inputs=/redacted/wgbs_test/test_wgbs.json, backend_conf=/redacted/wgbs_test/.caper_tmp/wgbs-pipeline/20231004_211058_313834/backend.co
2023-10-04 21:11:15,543|caper.cromwell_workflow_monitor|INFO| Workflow: id=91e49ae4-9226-4824-af45-301fc1a815e8, status=Submitted
2023-10-04 21:11:15,605|caper.cromwell_workflow_monitor|INFO| Workflow: id=91e49ae4-9226-4824-af45-301fc1a815e8, status=Running
2023-10-04 21:11:23,864|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.make_conf:-1, retry=0, status=CallCached
2023-10-04 21:11:26,814|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.make_metadata_csv:-1, retry=0, status=CallCached
2023-10-04 21:11:29,834|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.index_reference:-1, retry=0, status=CallCached
2023-10-04 21:11:35,809|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.prepare:-1, retry=0, status=CallCached
2023-10-04 21:11:44,809|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.map:0, retry=0, status=Started, job_id=2081286
2023-10-04 21:11:44,837|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.map:0, retry=0, status=Running
2023-10-04 21:11:51,943|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.map:0, retry=0, status=Done
2023-10-04 21:11:59,788|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.map:0, retry=1, status=Started, job_id=2081287
2023-10-04 21:11:59,796|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.map:0, retry=1, status=Running
2023-10-04 21:12:04,068|caper.cromwell_workflow_monitor|INFO| Task: id=91e49ae4-9226-4824-af45-301fc1a815e8, task=wgbs.map:0, retry=1, status=Done
2023-10-04 21:12:05,042|caper.cromwell_workflow_monitor|INFO| Workflow: id=91e49ae4-9226-4824-af45-301fc1a815e8, status=Failed
2023-10-04 21:12:15,586|caper.cromwell_metadata|INFO| Wrote metadata file. /redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/metadata.json
2023-10-04 21:12:15,587|caper.cromwell|INFO| Workflow failed. Auto-troubleshooting...
2023-10-04 21:12:15,589|caper.nb_subproc_thread|ERROR| Cromwell failed. returncode=1
2023-10-04 21:12:15,589|caper.cli|ERROR| Check stdout in /redacted/wgbs_test/cromwell.out.1
* Started troubleshooting workflow: id=91e49ae4-9226-4824-af45-301fc1a815e8, status=Failed
* Found failures JSON object.                                                      
[                                                                                  
    {                                                                              
        "causedBy": [                                                              
            {                                                                      
                "message": "Job wgbs.map:0:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []                                                     
            }                                                                      
        ],                                                                         
        "message": "Workflow failed"                                               
    }                                                                              
]                                                                                  
* Recursively finding failures in calls (tasks)...                                 
                                                                                   
==== NAME=wgbs.map, STATUS=RetryableFailure, PARENT=                               
SHARD_IDX=0, RC=1, JOB_ID=2081286                                                  
START=2023-10-04T21:11:41.051Z, END=2023-10-04T21:11:54.791Z                       
STDOUT=/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/execution/stdout
STDERR=/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/execution/stderr
STDERR_CONTENTS=                                                                   
tar: /redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/inputs/667055062/indexes.tar.gz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now                                         
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/execution/mapping/**/*.bam': No such file or directory
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/execution/mapping/**/*.csi': No such file or directory
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/execution/mapping/**/*.bam.md5': No such file or directory
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/execution/mapping/**/*.json': No such file or directory
                                                                                   
                                                                                   
==== NAME=wgbs.map, STATUS=Failed, PARENT=                                         
SHARD_IDX=0, RC=1, JOB_ID=2081287                                                  
START=2023-10-04T21:11:55.035Z, END=2023-10-04T21:12:04.072Z                       
STDOUT=/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/execution/stdout
STDERR=/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/execution/stderr
STDERR_CONTENTS=                                                                   
tar: /redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/inputs/667055062/indexes.tar.gz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now                                         
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/execution/mapping/**/*.bam': No such file or directory
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/execution/mapping/**/*.csi': No such file or directory
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/execution/mapping/**/*.bam.md5': No such file or directory
ln: failed to access '/redacted/wgbs_test/wgbs/91e49ae4-9226-4824-af45-301fc1a815e8/call-map/shard-0/attempt-2/execution/mapping/**/*.json': No such file or directory
                                                                                   
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant