Some patterns commonly encountered when writing CWL workflows
- Manifest file via Javascript
- Embedding scripts
- Embedding a bash script (style 2)
- Manipulating a list of files using expressions
- Link input files to working directory
- How to handle port type mismatches
Which of the Workflow Patterns Initiative patterns does CWL support?
My tool takes in a list of filenames as input. I used to pass each file name on the command line, but when I scale up I run into unix "Argument list too long" errors. What can I do?
If the program accepts, or can be altered to accept, a manifest file containing a list of input file paths, the CWL can be written to generate this manifest file on the fly before invoking the tool (example).
I have a Python script that I want to embed in my tool wrapper (not make part of the docker image). How would I do this?
You can embed the script as an InitialWorkDirRequirement
(example).
I use good software practices. My Python/Ruby/Haskell/... script is in a separate file from my CWL wrapper.
You can use the $include
directive to pull in the file into your CWL.
You can use it in InitialWorkDirRequirement
(example) or in File Contents (example)
My embedded (Bash/Python/R) script has "$" signs in it and this is conflicting with CWL parameter references. How do get them to play nicely together?
You can escape the $
in your script with \$
I'd rather just paste the script into the CWL and not have to go through it, escaping the "$" signs. Do I have another option?
You could embed your script in the "contents" field of the default value of a file. To note in this solution is that a user can supply a different file to the script input and override the default script. This can be considered a bug or a feature, depending on your use case (example).
A bash script can be passed as a string via the command line and invoked from the command line. (example).
I have an input that is a list of files. I wish to do processing based on the file paths.
This depends a bit on what the expression is intended to do. The easiest is if the whole processing can be done in javascript. In this case the pattern looks like
${
var cmd = "";
for( var i = 0; i < inputs.files.length; i++) {
cmd += "\n echo " + inputs.files[i].path;
}
return cmd;
}
(example)
This works well for when we can do with a JSON or simple string return value
from the JS code. If what we really want is an embedded script (say a bash or
Python script) to be generated, say via the InitialWorkDirRequirement
it
becomes cumbersome to write in this fashion. One way of doing this is to write a
succinct JS expression that converts the passed JSON object into a list of paths
in the syntax accepted by the script.
Here is an example for Python and for bash
I have a tool that does not do well with arbitrary file paths. I'd like to link the files into the working directory so I don't have to deal with arbitrary mount paths and so on.
You can use InitialWorkDirRequirement
to link the files
(example).
You can mix this with embedding scripts (example).
- Tool A produces a list of Files (or strings, ints ...)
- Tool B accepts only a single File (or string, int ...)
- How do I connect A to B?
If you are sure this is not going to be a problem, e.g. in this context A will
only ever produce one file, or you are only interested in one file, you can use
a step valueFrom
expression to convert the types.
Here is a workflow that will raise validation warnings and will fail on execution because of port type mismatches.
Here is the same workflow with
valueFrom
added to make the port types match.