doc/user/cwl/cwl-style.html.textile.liquid

   1 ---
   2 layout: default
   3 navsection: userguide
   4 title: Best Practices for writing CWL
   5 ...
   6
   7 * To run on Arvados, a workflow should provide a @DockerRequirement@ in the @hints@ section.
   8
   9 * Build a reusable library of components.  Share tool wrappers and subworkflows between projects.  Make use of and contribute to "community maintained workflows and tools":https://github.com/common-workflow-language/workflows and tool registries such as "Dockstore":http://dockstore.org .
  10
  11 * When combining a parameter value with a string, such as adding a filename extension, write @$(inputs.file.basename).ext@ instead of @$(inputs.file.basename + 'ext')@.  The first form is evaluated as a simple text substitution, the second form (using the @+@ operator) is evaluated as an arbitrary Javascript expression and requires that you declare @InlineJavascriptRequirement@.
  12
  13 * Avoid declaring @InlineJavascriptRequirement@ or @ShellCommandRequirement@ unless you specifically need them.  Don't include them "just in case" because they change the default behavior and may imply extra overhead.
  14
  15 * Don't write CWL scripts that access the Arvados SDK.  This is non-portable; a script that access Arvados directly won't work with @cwltool@ or crunch v2.
  16
  17 * CommandLineTools wrapping custom scripts should represent the script as an input parameter with the script file as a default value.  Use @secondaryFiles@ for scripts that consist of multiple files.  For example:
  18
  19 <pre>
  20 cwlVersion: v1.0
  21 class: CommandLineTool
  22 baseCommand: python
  23 inputs:
  24   script:
  25     type: File
  26     inputBinding: {position: 1}
  27     default:
  28       class: File
  29       location: bclfastq.py
  30       secondaryFiles:
  31         - class: File
  32           location: helper1.py
  33         - class: File
  34           location: helper2.py
  35   inputfile:
  36     type: File
  37     inputBinding: {position: 2}
  38 outputs:
  39   out:
  40     type: File
  41     outputBinding:
  42       glob: "*.fastq"
  43 </pre>
  44
  45 * You can get the designated temporary directory using @$(runtime.tmpdir)@ in your CWL file, or from the @$TMPDIR@ environment variable in your script.
  46
  47 * Similarly, you can get the designated output directory using $(runtime.outdir), or from the @HOME@ environment variable in your script.
  48
  49 * Use @ExpressionTool@ to efficiently rearrange input files between steps of a Workflow.  For example, the following expression accepts a directory containing files paired by @_R1_@ and @_R2_@ and produces an array of Directories containing each pair.
  50
  51 <pre>
  52 class: ExpressionTool
  53 cwlVersion: v1.0
  54 inputs:
  55   inputdir: Directory
  56 outputs:
  57   out: Directory[]
  58 requirements:
  59   InlineJavascriptRequirement: {}
  60 expression: |
  61   ${
  62     var samples = {};
  63     for (var i = 0; i < inputs.inputdir.listing.length; i++) {
  64       var file = inputs.inputdir.listing[i];
  65       var groups = file.basename.match(/^(.+)(_R[12]_)(.+)$/);
  66       if (groups) {
  67         if (!samples[groups[1]]) {
  68           samples[groups[1]] = [];
  69         }
  70         samples[groups[1]].push(file);
  71       }
  72     }
  73     var dirs = [];
  74     for (var key in samples) {
  75       dirs.push({"class": "Directory",
  76                  "basename": key,
  77                  "listing": [samples[key]]});
  78     }
  79     return {"out": dirs};
  80   }
  81 </pre>
  82
  83 * Avoid specifying resource requirements in CommandLineTool.  Prefer to specify them in the workflow.  You can provide a default resource requirement in the top level @hints@ section, and individual steps can override it with their own resource requirement.
  84
  85 <pre>
  86 cwlVersion: v1.0
  87 class: Workflow
  88 inputs:
  89   inp: File
  90 hints:
  91   ResourceRequirement:
  92     ramMin: 1000
  93     coresMin: 1
  94     tmpdirMin: 45000
  95 steps:
  96   step1:
  97     in: {inp: inp}
  98     out: [out]
  99     run: tool1.cwl
 100   step2:
 101     in: {inp: step1/inp}
 102     out: [out]
 103     run: tool2.cwl
 104     hints:
 105       ResourceRequirement:
 106         ramMin: 2000
 107         coresMin: 2
 108         tmpdirMin: 90000
 109 </pre>
 110
 111 * Instead of scattering separate steps, prefer to scatter over a subworkflow.
 112
 113 With the following pattern, @step1@ has to wait for all samples to complete before @step2@ can start computing on any samples.  This means a single long-running sample can prevent the rest of the workflow from moving on:
 114
 115 <pre>
 116 cwlVersion: v1.0
 117 class: Workflow
 118 inputs:
 119   inp: File
 120 steps:
 121   step1:
 122     in: {inp: inp}
 123     scatter: inp
 124     out: [out]
 125     run: tool1.cwl
 126   step2:
 127     in: {inp: step1/inp}
 128     scatter: inp
 129     out: [out]
 130     run: tool2.cwl
 131   step3:
 132     in: {inp: step2/inp}
 133     scatter: inp
 134     out: [out]
 135     run: tool3.cwl
 136 </pre>
 137
 138 Instead, scatter over a subworkflow.  In this pattern, a sample can proceed to @step2@ as soon as @step1@ is done, independently of any other samples.
 139 Example: (note, the subworkflow can also be put in a separate file)
 140
 141 <pre>
 142 cwlVersion: v1.0
 143 class: Workflow
 144 steps:
 145   step1:
 146     in: {inp: inp}
 147     scatter: inp
 148     out: [out]
 149     run:
 150       class: Workflow
 151       inputs:
 152         inp: File
 153       outputs:
 154         out:
 155           type: File
 156           outputSource: step3/out
 157       steps:
 158         step1:
 159           in: {inp: inp}
 160           out: [out]
 161           run: tool1.cwl
 162         step2:
 163           in: {inp: step1/inp}
 164           out: [out]
 165           run: tool2.cwl
 166         step3:
 167           in: {inp: step2/inp}
 168           out: [out]
 169           run: tool3.cwl
 170 </pre>