doc/user/cwl/cwl-extensions.html.textile.liquid

   1 ---
   2 layout: default
   3 navsection: userguide
   4 title: Arvados CWL Extensions
   5 ...
   6 {% comment %}
   7 Copyright (C) The Arvados Authors. All rights reserved.
   8
   9 SPDX-License-Identifier: CC-BY-SA-3.0
  10 {% endcomment %}
  11
  12 Arvados provides several extensions to CWL for workflow optimization, site-specific configuration, and to enable access the Arvados API.
  13
  14 To use Arvados CWL extensions, add the following @$namespaces@ section at the top of your CWL file:
  15
  16 {% codeblock as yaml %}
  17 $namespaces:
  18   arv: "http://arvados.org/cwl#"
  19   cwltool: "http://commonwl.org/cwltool#"
  20 {% endcodeblock %}
  21
  22 For portability, most Arvados extensions should go into the @hints@ section of your CWL file.  This makes it possible for your workflows to run other CWL runners that do not recognize Arvados hints.  The difference between @hints@ and @requirements@ is that @hints@ are optional features that can be ignored by other runners and still produce the same output, whereas @requirements@ will fail the workflow if they cannot be fulfilled.  For example, @arv:IntermediateOutput@ should go in @hints@ as it will have no effect on non-Arvados platforms, however if your workflow explicitly accesses the Arvados API and will fail without it, you should put @arv:APIRequirement@ in @requirements@.
  23
  24 * "RunInSingleContainer":#RunInSingleContainer
  25 * "SeparateRunner":#SeparateRunner
  26 * "RuntimeConstraints":#RuntimeConstraints
  27 * "PartitionRequirement":#PartitionRequirement
  28 * "APIRequirement":#APIRequirement
  29 * "IntermediateOutput":#IntermediateOutput
  30 * "Secrets":#Secrets
  31 * "WorkflowRunnerResources":#WorkflowRunnerResources
  32 * "ClusterTarget":#ClusterTarget
  33 * "OutputStorageClass":#OutputStorageClass
  34 * "ProcessProperties":#ProcessProperties
  35 * "OutputCollectionProperties":#OutputCollectionProperties
  36 * "CUDARequirement":#CUDARequirement
  37 * "ROCmRequirement":#ROCmRequirement
  38 * "UsePreemptible":#UsePreemptible
  39 * "PreemptionBehavior":#PreemptionBehavior
  40 * "OutOfMemoryRetry":#OutOfMemoryRetry
  41
  42 {% codeblock as yaml %}
  43 hints:
  44   arv:RunInSingleContainer: {}
  45
  46   arv:SeparateRunner:
  47     runnerProcessName: $(inputs.sample_id)
  48
  49   arv:RuntimeConstraints:
  50     keep_cache: 123456
  51     outputDirType: keep_output_dir
  52
  53   arv:PartitionRequirement:
  54     partition: dev_partition
  55
  56   arv:APIRequirement: {}
  57
  58   arv:IntermediateOutput:
  59     outputTTL: 3600
  60
  61   cwltool:Secrets:
  62     secrets: [input1, input2]
  63
  64   arv:WorkflowRunnerResources:
  65     ramMin: 2048
  66     coresMin: 2
  67     keep_cache: 512
  68
  69   arv:ClusterTarget:
  70     cluster_id: clsr1
  71     project_uuid: clsr1-j7d0g-qxc4jcji7n4lafx
  72
  73   arv:OutputStorageClass:
  74     intermediateStorageClass: fast_storage
  75     finalStorageClass: robust_storage
  76
  77   arv:ProcessProperties:
  78     processProperties:
  79       property1: value1
  80       property2: $(inputs.value2)
  81
  82   arv:OutputCollectionProperties:
  83     outputProperties:
  84       property1: value1
  85       property2: $(inputs.value2)
  86
  87   cwltool:CUDARequirement:
  88     cudaVersionMin: "11.0"
  89     cudaComputeCapability: "9.0"
  90     cudaDeviceCountMin: 1
  91     cudaDeviceCountMax: 1
  92     cudaVram: 8000
  93
  94   arv:ROCmRequirement:
  95     rocmDriverVersion: "6.2"
  96     rocmTarget: ["gfx1100", "gfx1103"]
  97     rocmDeviceCountMin: 1
  98     rocmDeviceCountMax: 1
  99     rocmVram: 8000
 100
 101   arv:UsePreemptible:
 102     usePreemptible: true
 103
 104   arv:PreemptionBehavior:
 105     resubmitNonPreemptible: true
 106
 107   arv:OutOfMemoryRetry:
 108     memoryRetryMultiplier: 2
 109     memoryErrorRegex: "custom memory error"
 110 {% endcodeblock %}
 111
 112 h2(#RunInSingleContainer). arv:RunInSingleContainer
 113
 114 Apply this to a workflow step that runs a subworkflow.  Indicates that all the steps of the subworkflow should run together in a single container and not be scheduled separately.  If you have a sequence of short-running steps (less than 1-2 minutes each) this enables you to avoid scheduling and data transfer overhead by running all the steps together at once.  To use this feature, @cwltool@ must be installed in the container image.
 115
 116 h2(#SeparateRunner). arv:SeparateRunner
 117
 118 Apply this to a workflow step that runs a subworkflow.  Indicates that Arvados should launch a new workflow runner to manage that specific subworkflow instance.  If used on a scatter step, each scatter item is launched separately.  Using this option has three benefits:
 119
 120 * Better organization in the "Subprocesses" table of the main workflow, including the ability to provide a custom name for the step
 121 * When re-running a batch that has run before, an entire subworkflow may be reused as a unit, which is faster than determining reuse for each step.
 122 * Significantly faster submit rate compared to invoking @arvados-cwl-runner@ to launch individual workflow instances separately.
 123
 124 The disadvantage of this option is that because it does launch an additional workflow runner, that workflow runner consumes more compute resources compared to having all the steps managed by a single runner.
 125
 126 table(table table-bordered table-condensed).
 127 |_. Field |_. Type |_. Description |
 128 |runnerProcessName|optional string|Name to assign to the subworkflow process.  May be an expression with an input context of the post-scatter workflow step invocation.|
 129
 130 h2(#RuntimeConstraints). arv:RuntimeConstraints
 131
 132 Set Arvados-specific runtime hints.
 133
 134 table(table table-bordered table-condensed).
 135 |_. Field |_. Type |_. Description |
 136 |keep_cache|int|Size of file data buffer for Keep mount in MiB. Default is 256 MiB. Increase this to reduce cache thrashing in situations such as accessing multiple large (64+ MiB) files at the same time, or performing random access on a large file.|
 137 |outputDirType|enum|Preferred backing store for output staging.  If not specified, the system may choose which one to use.  One of *local_output_dir* or *keep_output_dir*|
 138
 139 *local_output_dir*: Use regular file system local to the compute node. There must be sufficient local scratch space to store entire output; specify this with @outdirMin@ of @ResourceRequirement@.  Files are batch uploaded to Keep when the process completes.  Most compatible, but upload step can be time consuming for very large files.
 140
 141 *keep_output_dir*: Use writable Keep mount.  Files are streamed to Keep as they are written.  Does not consume local scratch space, but does consume RAM for output buffers (up to 192 MiB per file simultaneously open for writing.)  Best suited to processes which produce sequential output of large files (non-sequential writes may produced fragmented file manifests).  Supports regular files and directories, does not support special files such as symlinks, hard links, named pipes, named sockets, or device nodes.|
 142
 143 h2(#PartitionRequirement). arv:PartitionRequirement
 144
 145 Select preferred compute partitions on which to run jobs.
 146
 147 table(table table-bordered table-condensed).
 148 |_. Field |_. Type |_. Description |
 149 |partition|string or array of strings||
 150
 151 h2(#APIRequirement). arv:APIRequirement
 152
 153 For CWL v1.1 scripts, if a step requires network access but not specifically access to the Arvados API server, prefer the standard feature "NetworkAccess":https://www.commonwl.org/v1.1/CommandLineTool.html#NetworkAccess .  In the future, these may be differentiated by whether ARVADOS_API_HOST and ARVADOS_API_TOKEN is injected into the container or not.
 154
 155 Indicates that process wants to access to the Arvados API.  Will be granted network access and have @ARVADOS_API_HOST@ and @ARVADOS_API_TOKEN@ set in the environment.  Tools which rely on the Arvados API being present should put @arv:APIRequirement@ in the @requirements@ section of the tool (rather than @hints@) to indicate that that it is not portable to non-Arvados CWL runners.
 156
 157 Use @arv:APIRequirement@ in @hints@ to enable general (non-Arvados-specific) network access for a tool.
 158
 159 h2(#IntermediateOutput). arv:IntermediateOutput
 160
 161 Specify desired handling of intermediate output collections.
 162
 163 table(table table-bordered table-condensed).
 164 |_. Field |_. Type |_. Description |
 165 |outputTTL|int|If the value is greater than zero, consider intermediate output collections to be temporary and should be automatically trashed. Temporary collections will be trashed @outputTTL@ seconds after creation.  A value of zero means intermediate output should be retained indefinitely (this is the default behavior).
 166 Note: arvados-cwl-runner currently does not take workflow dependencies into account when setting the TTL on an intermediate output collection. If the TTL is too short, it is possible for a collection to be trashed before downstream steps that consume it are started.  The recommended minimum value for TTL is the expected duration of the entire workflow.|
 167
 168 h2(#Secrets). cwltool:Secrets
 169
 170 Indicate that one or more input parameters are "secret".  Must be applied at the top level Workflow.  Secret parameters are not stored in keep, are hidden from logs and API responses, and are wiped from the database after the workflow completes.
 171
 172 *Note: currently, workflows with secrets must be submitted on the command line using @arvados-cwl-runner@.  Workflows with secrets submitted through Workbench will not properly obscure the secret inputs.*
 173
 174 table(table table-bordered table-condensed).
 175 |_. Field |_. Type |_. Description |
 176 |secrets|array<string>|Input parameters which are considered "secret".  Must be strings.|
 177
 178 h2(#WorkflowRunnerResources). arv:WorkflowRunnerResources
 179
 180 Specify resource requirements for the workflow runner process (arvados-cwl-runner) that manages a workflow run.  Must be applied to the top level workflow.  Will also be set implicitly when using @--submit-runner-ram@ on the command line along with @--create-workflow@ or @--update-workflow@.  Use this to adjust the runner's allocation if the workflow runner is getting "out of memory" exceptions or being killed by the out-of-memory (OOM) killer.
 181
 182 table(table table-bordered table-condensed).
 183 |_. Field |_. Type |_. Description |
 184 |ramMin|int|RAM, in mebibytes, to reserve for the arvados-cwl-runner process. Default 1 GiB|
 185 |coresMin|int|Number of cores to reserve to the arvados-cwl-runner process. Default 1 core.|
 186 |keep_cache|int|Size of collection metadata cache for the workflow runner, in MiB.  Default 256 MiB.  Will be added on to the RAM request when determining node size to request.|
 187
 188 h2(#ClusterTarget). arv:ClusterTarget
 189
 190 Specify which Arvados cluster should execute a container or subworkflow, and the parent project for the container request.
 191
 192 table(table table-bordered table-condensed).
 193 |_. Field |_. Type |_. Description |
 194 |cluster_id|string|The five-character alphanumeric cluster id (uuid prefix) where a container or subworkflow will execute.  May be an expression.|
 195 |project_uuid|string|The uuid of the project which will own container request and output of the container.  May be an expression.|
 196
 197 h2(#OutputStorageClass). arv:OutputStorageClass
 198
 199 Specify the "storage class":{{site.baseurl}}/user/topics/storage-classes.html to use for intermediate and final outputs.
 200
 201 table(table table-bordered table-condensed).
 202 |_. Field |_. Type |_. Description |
 203 |intermediateStorageClass|string or array of strings|The storage class for output of intermediate steps.  For example, faster "hot" storage.|
 204 |finalStorageClass_uuid|string or array of strings|The storage class for the final output.  |
 205
 206 h2(#ProcessProperties). arv:ProcessProperties
 207
 208 Specify extra "properties":{{site.baseurl}}/api/methods.html#subpropertyfilters that will be set on container requests created by the workflow.  May be set on a Workflow or a CommandLineTool.  Setting custom properties on a container request simplifies queries to find the workflow run later on.
 209
 210 table(table table-bordered table-condensed).
 211 |_. Field |_. Type |_. Description |
 212 |processProperties|key-value map, or list of objects with the fields {propertyName, propertyValue}|The properties that will be set on the container request.  May include expressions that reference @$(inputs)@ of the current workflow or tool.|
 213
 214 h2(#OutputCollectionProperties). arv:OutputCollectionProperties
 215
 216 Specify custom "properties":{{site.baseurl}}/api/methods.html#subpropertyfilters that will be set on the output collection of the workflow step.
 217
 218 table(table table-bordered table-condensed).
 219 |_. Field |_. Type |_. Description |
 220 |outputProperties|key-value map, or list of objects with the fields {propertyName, propertyValue}|The properties that will be set on the output collection.  May include expressions that reference @$(inputs)@ of the current workflow or tool.|
 221
 222 h2(#CUDARequirement). cwltool:CUDARequirement
 223
 224 Request support for Nvidia CUDA GPU acceleration in the container.  Assumes that the CUDA runtime (SDK) is installed in the container, and the host will inject the CUDA driver libraries into the container (equal or later to the version requested).
 225
 226 table(table table-bordered table-condensed).
 227 |_. Field |_. Type |_. Description |
 228 |cudaVersionMin|string|Required.  The CUDA SDK version corresponding to the minimum driver version supported by the container (generally, the SDK version 'X.Y' the application was compiled against).|
 229 |cudaComputeCapability|string|Required.  The minimum CUDA hardware capability (in 'X.Y' format) required by the application's PTX or C++ GPU code (will be JIT compiled for the available hardware).|
 230 |cudaDeviceCountMin|integer|Minimum number of GPU devices to allocate on a single node. Required.|
 231 |cudaDeviceCountMax|integer|Maximum number of GPU devices to allocate on a single node. Optional.  If not specified, same as @cudaDeviceCountMin@.|
 232 |cudaVram|integer|Requested amount of VRAM per device, in mebibytes (2**20)|
 233
 234 h2(#ROCmRequirement). cwltool:ROCmRequirement
 235
 236 Request support for AMD ROCm GPU acceleration in the container.  Assumes that the ROCm runtime (SDK) is installed in the container, and the host will inject the AMD devices (@/dev/kfd@ and @/dev/dri/renderD*@) container.
 237
 238 table(table table-bordered table-condensed).
 239 |_. Field |_. Type |_. Description |
 240 |rocmDriverVersion|string|Required.  The ROCm SDK version corresponding to the minimum driver version supported by the container (generally, the SDK version 'X.Y' the application was compiled against).|
 241 |rocmTarget|array of string|Required.  A list of one or more hardware targets (e.g. gfx1100) corresponding to the GPU architectures supported by the container.  Use @rocminfo@ to determine what hardware targets you have.  See also "Accelerator and GPU hardware specifications":https://rocm.docs.amd.com/en/latest/reference/gpu-arch-specs.html (use the column "LLVM target name") and "LLVM AMDGPU backend documentation":https://llvm.org/docs/AMDGPUUsage.html .|
 242 |rocmDeviceCountMin|integer|Minimum number of GPU devices to allocate on a single node. Required.|
 243 |rocmDeviceCountMax|integer|Maximum number of GPU devices to allocate on a single node. Optional.  If not specified, same as @rocmDeviceCountMin@.|
 244 |rocmVram|integer|Requested amount of VRAM per device, in mebibytes (2**20)|
 245
 246 h2(#UsePreemptible). arv:UsePreemptible
 247
 248 Specify whether a workflow step should request preemptible (e.g. AWS Spot market) instances.  Such instances are generally cheaper, but can be taken back by the cloud provider at any time (preempted) causing the step to fail.  When this happens, Arvados will automatically re-try the step, up to the configuration value of @Containers.MaxRetryAttempts@ (default 3) times.
 249
 250 table(table table-bordered table-condensed).
 251 |_. Field |_. Type |_. Description |
 252 |usePreemptible|boolean|Required, true to opt-in to using preemptible instances, false to opt-out.|
 253
 254 h2(#PreemptionBehavior). arv:PreemptionBehavior
 255
 256 This option determines the behavior when @arvados-cwl-runner@ detects that a workflow step was cancelled because the preemptible (spot market) instance it was running on was reclaimed by the cloud provider.  If 'true', instead of the retry behavior described above in 'UsePreemptible', on the first failure the workflow step will be re-submitted with preemption disabled, so it will be scheduled to run on non-preemptible (on-demand) instances.
 257
 258 When preemptible instances are reclaimed, this is a signal that the cloud provider has restricted capacity for low priority preemptible instance.  As a result, the default behavior of turning around and rescheduling or launching on another preemptible instance has higher risk of being preempted a second or third time, spending more time and money but making no progress.  This option provides an alternate fallback behavior, by attempting to run the step on a preemptible instance the first time (saving money), but re-running the step as non-preemptible if the first attempt was preempted (ensuring continued progress).
 259
 260 This behavior applied to each step individually.  If a step is preempted, then successfully re-run as non-preemptible, it does not affect the behavior of the next step, which will first be launched as preemptible, and so forth.
 261
 262 table(table table-bordered table-condensed).
 263 |_. Field |_. Type |_. Description |
 264 |resubmitNonPreemptible|boolean|Required.  If true, then when a workflow step is cancelled because the instance was preempted, re-submit the step with preemption disabled.|
 265
 266 h2(#OutOfMemoryRetry). arv:OutOfMemoryRetry
 267
 268 Specify that when a workflow step appears to have failed because it did not request enough RAM, it should be re-submitted with more RAM.  Out of memory conditions are detected either by the container being unexpectedly killed (exit code 137) or by matching a pattern in the container's output (see @memoryErrorRegex@).  Retrying will increase the base RAM request by the value of @memoryRetryMultiplier@.  For example, if the original RAM request was 10 GiB and the multiplier is 1.5, then it will re-submit with 15 GiB.
 269
 270 Containers are only re-submitted once.  If it fails a second time after increasing RAM, then the worklow step will still fail.
 271
 272 Also note that expressions that use @$(runtime.ram)@ (such as dynamic command line parameters) are not reevaluated when the container is resubmitted.
 273
 274 table(table table-bordered table-condensed).
 275 |_. Field |_. Type |_. Description |
 276 |memoryRetryMultiplier|float|Optional, default value is 2.  The retry will multiply the base memory request by this factor to get the retry memory request.|
 277 |memoryErrorRegex|string|Optional, a custom regex that, if found in the stdout, stderr or crunch-run logging of a program, will trigger a retry with greater RAM.  If not provided, the default pattern matches "out of memory" (with or without spaces), "memory error" (with or without spaces), "bad_alloc" and "container using over 90% of memory".|
 278
 279 h2. arv:dockerCollectionPDH
 280
 281 This is an optional extension field appearing on the standard @DockerRequirement@.  It specifies the portable data hash of the Arvados collection containing the Docker image.  If present, it takes precedence over @dockerPull@ or @dockerImageId@.
 282
 283 <pre>
 284 requirements:
 285   DockerRequirement:
 286     dockerPull: "debian:10"
 287     arv:dockerCollectionPDH: "feaf1fc916103d7cdab6489e1f8c3a2b+174"
 288 </pre>
 289
 290 h1. Deprecated extensions
 291
 292 The following extensions are deprecated because equivalent features are part of the CWL v1.1 standard.
 293
 294 {% codeblock as yaml %}
 295 hints:
 296   cwltool:LoadListingRequirement:
 297     loadListing: shallow_listing
 298   arv:ReuseRequirement:
 299     enableReuse: false
 300   cwltool:TimeLimit:
 301     timelimit: 14400
 302 {% endcodeblock %}
 303
 304 h2. cwltool:LoadListingRequirement
 305
 306 For CWL v1.1 scripts, this is deprecated in favor of "loadListing":https://www.commonwl.org/v1.1/CommandLineTool.html#CommandInputParameter or "LoadListingRequirement":https://www.commonwl.org/v1.1/CommandLineTool.html#LoadListingRequirement
 307
 308 In CWL v1.0 documents, the default behavior for Directory objects is to recursively expand the @listing@ for access by parameter references an expressions.  For directory trees containing many files, this can be expensive in both time and memory usage.  Use @cwltool:LoadListingRequirement@ to change the behavior for expansion of directory listings in the workflow runner.
 309
 310 table(table table-bordered table-condensed).
 311 |_. Field |_. Type |_. Description |
 312 |loadListing|string|One of @no_listing@, @shallow_listing@, or @deep_listing@|
 313
 314 *no_listing*: Do not expand directory listing at all.  The @listing@ field on the Directory object will be undefined.
 315
 316 *shallow_listing*: Only expand the first level of directory listing.  The @listing@ field on the toplevel Directory object will contain the directory contents, however @listing@ will not be defined on subdirectories.
 317
 318 *deep_listing*: Recursively expand all levels of directory listing.  The @listing@ field will be provided on the toplevel object and all subdirectories.
 319
 320 h2. arv:ReuseRequirement
 321
 322 For CWL v1.1 scripts, this is deprecated in favor of "WorkReuse":https://www.commonwl.org/v1.1/CommandLineTool.html#WorkReuse .
 323
 324 Enable/disable work reuse for current process.  Default true (work reuse enabled).
 325
 326 table(table table-bordered table-condensed).
 327 |_. Field |_. Type |_. Description |
 328 |enableReuse|boolean|Enable/disable work reuse for current process.  Default true (work reuse enabled).|
 329
 330 h2. cwltool:TimeLimit
 331
 332 For CWL v1.1 scripts, this is deprecated in favor of "ToolTimeLimit":https://www.commonwl.org/v1.1/CommandLineTool.html#ToolTimeLimit
 333
 334 Set an upper limit on the execution time of a CommandLineTool or ExpressionTool.  A tool execution which exceeds the time limit may be preemptively terminated and considered failed.  May also be used by batch systems to make scheduling decisions.
 335
 336 table(table table-bordered table-condensed).
 337 |_. Field |_. Type |_. Description |
 338 |timelimit|int|Execution time limit in seconds. If set to zero, no limit is enforced.|