--- layout: default navsection: userguide title: Best Practices for writing CWL ... * To run on Arvados, a workflow should provide a @DockerRequirement@ in the @hints@ section. * Build a reusable library of components. Share tool wrappers and subworkflows between projects. Make use of and contribute to "community maintained workflows and tools":https://github.com/common-workflow-language/workflows and tool registries such as "Dockstore":http://dockstore.org . * When combining a parameter value with a string, such as adding a filename extension, write @$(inputs.file.basename).ext@ instead of @$(inputs.file.basename + 'ext')@. The first form is evaluated as a simple text substitution, the second form (using the @+@ operator) is evaluated as an arbitrary Javascript expression and requires that you declare @InlineJavascriptRequirement@. * Avoid declaring @InlineJavascriptRequirement@ or @ShellCommandRequirement@ unless you specifically need them. Don't include them "just in case" because they change the default behavior and may imply extra overhead. * Don't write CWL scripts that access the Arvados SDK. This is non-portable; a script that access Arvados directly won't work with @cwltool@ or crunch v2. * CommandLineTools wrapping custom scripts should represent the script as an input parameter with the script file as a default value. Use @secondaryFiles@ for scripts that consist of multiple files. For example:
cwlVersion: v1.0 class: CommandLineTool baseCommand: python inputs: script: type: File inputBinding: {position: 1} default: class: File location: bclfastq.py secondaryFiles: - class: File location: helper1.py - class: File location: helper2.py inputfile: type: File inputBinding: {position: 2} outputs: out: type: File outputBinding: glob: "*.fastq"* You can get the designated temporary directory using @$(runtime.tmpdir)@ in your CWL file, or from the @$TMPDIR@ environment variable in your script. * Similarly, you can get the designated output directory using $(runtime.outdir), or from the @HOME@ environment variable in your script. * Use @ExpressionTool@ to efficiently rearrange input files between steps of a Workflow. For example, the following expression accepts a directory containing files paired by @_R1_@ and @_R2_@ and produces an array of Directories containing each pair.
class: ExpressionTool cwlVersion: v1.0 inputs: inputdir: Directory outputs: out: Directory[] requirements: InlineJavascriptRequirement: {} expression: | ${ var samples = {}; for (var i = 0; i < inputs.inputdir.listing.length; i++) { var file = inputs.inputdir.listing[i]; var groups = file.basename.match(/^(.+)(_R[12]_)(.+)$/); if (groups) { if (!samples[groups[1]]) { samples[groups[1]] = []; } samples[groups[1]].push(file); } } var dirs = []; for (var key in samples) { dirs.push({"class": "Directory", "basename": key, "listing": [samples[key]]}); } return {"out": dirs}; }* Avoid specifying resource requirements in CommandLineTool. Prefer to specify them in the workflow. You can provide a default resource requirement in the top level @hints@ section, and individual steps can override it with their own resource requirement.
cwlVersion: v1.0 class: Workflow inputs: inp: File hints: ResourceRequirement: ramMin: 1000 coresMin: 1 tmpdirMin: 45000 steps: step1: in: {inp: inp} out: [out] run: tool1.cwl step2: in: {inp: step1/inp} out: [out] run: tool2.cwl hints: ResourceRequirement: ramMin: 2000 coresMin: 2 tmpdirMin: 90000* Instead of scattering separate steps, prefer to scatter over a subworkflow. With the following pattern, @step1@ has to wait for all samples to complete before @step2@ can start computing on any samples. This means a single long-running sample can prevent the rest of the workflow from moving on:
cwlVersion: v1.0 class: Workflow inputs: inp: File steps: step1: in: {inp: inp} scatter: inp out: [out] run: tool1.cwl step2: in: {inp: step1/inp} scatter: inp out: [out] run: tool2.cwl step3: in: {inp: step2/inp} scatter: inp out: [out] run: tool3.cwlInstead, scatter over a subworkflow. In this pattern, a sample can proceed to @step2@ as soon as @step1@ is done, independently of any other samples. Example: (note, the subworkflow can also be put in a separate file)
cwlVersion: v1.0 class: Workflow steps: step1: in: {inp: inp} scatter: inp out: [out] run: class: Workflow inputs: inp: File outputs: out: type: File outputSource: step3/out steps: step1: in: {inp: inp} out: [out] run: tool1.cwl step2: in: {inp: step1/inp} out: [out] run: tool2.cwl step3: in: {inp: step2/inp} out: [out] run: tool3.cwl