doc/user/cwl/cwl-extensions.html.textile.liquid

   1 ---
   2 layout: default
   3 navsection: userguide
   4 title: Arvados CWL Extensions
   5 ...
   6 {% comment %}
   7 Copyright (C) The Arvados Authors. All rights reserved.
   8
   9 SPDX-License-Identifier: CC-BY-SA-3.0
  10 {% endcomment %}
  11
  12 Arvados provides several extensions to CWL for workflow optimization, site-specific configuration, and to enable access the Arvados API.
  13
  14 To use Arvados CWL extensions, add the following @$namespaces@ section at the top of your CWL file:
  15
  16 <pre>
  17 $namespaces:
  18   arv: "http://arvados.org/cwl#"
  19   cwltool: "http://commonwl.org/cwltool#"
  20 </pre>
  21
  22 For portability, Arvados extensions should go into the @hints@ section of your CWL file, for example:
  23
  24 <pre>
  25 hints:
  26   arv:RunInSingleContainer: {}
  27   arv:RuntimeConstraints:
  28     keep_cache: 123456
  29     outputDirType: keep_output_dir
  30   arv:PartitionRequirement:
  31     partition: dev_partition
  32   arv:APIRequirement: {}
  33   cwltool:LoadListingRequirement:
  34     loadListing: shallow_listing
  35   arv:IntermediateOutput:
  36     outputTTL: 3600
  37   arv:ReuseRequirement:
  38     enableReuse: false
  39   cwltool:Secrets:
  40     secrets: [input1, input2]
  41   cwltool:TimeLimit:
  42     timelimit: 14400
  43   arv:WorkflowRunnerResources:
  44     ramMin: 2048
  45     coresMin: 2
  46     keep_cache: 512
  47   arv:ClusterTarget:
  48     cluster_id: clsr1
  49     project_uuid: clsr1-j7d0g-qxc4jcji7n4lafx
  50 </pre>
  51
  52 The one exception to this is @arv:APIRequirement@, see note below.
  53
  54 h2. arv:RunInSingleContainer
  55
  56 Indicates that a subworkflow should run in a single container and not be scheduled as separate steps.
  57
  58 h2. arv:RuntimeConstraints
  59
  60 Set Arvados-specific runtime hints.
  61
  62 table(table table-bordered table-condensed).
  63 |_. Field |_. Type |_. Description |
  64 |keep_cache|int|Size of file data buffer for Keep mount in MiB. Default is 256 MiB. Increase this to reduce cache thrashing in situations such as accessing multiple large (64+ MiB) files at the same time, or performing random access on a large file.|
  65 |outputDirType|enum|Preferred backing store for output staging.  If not specified, the system may choose which one to use.  One of *local_output_dir* or *keep_output_dir*|
  66
  67 *local_output_dir*: Use regular file system local to the compute node. There must be sufficient local scratch space to store entire output; specify this with @outdirMin@ of @ResourceRequirement@.  Files are batch uploaded to Keep when the process completes.  Most compatible, but upload step can be time consuming for very large files.
  68
  69 *keep_output_dir*: Use writable Keep mount.  Files are streamed to Keep as they are written.  Does not consume local scratch space, but does consume RAM for output buffers (up to 192 MiB per file simultaneously open for writing.)  Best suited to processes which produce sequential output of large files (non-sequential writes may produced fragmented file manifests).  Supports regular files and directories, does not support special files such as symlinks, hard links, named pipes, named sockets, or device nodes.|
  70
  71 h2. arv:PartitionRequirement
  72
  73 Select preferred compute partitions on which to run jobs.
  74
  75 table(table table-bordered table-condensed).
  76 |_. Field |_. Type |_. Description |
  77 |partition|string or array of strings||
  78
  79 h2. arv:APIRequirement
  80
  81 Indicates that process wants to access to the Arvados API.  Will be granted network access and have @ARVADOS_API_HOST@ and @ARVADOS_API_TOKEN@ set in the environment.  Tools which rely on the Arvados API being present should put @arv:APIRequirement@ in the @requirements@ section of the tool (rather than @hints@) to indicate that that it is not portable to non-Arvados CWL runners.
  82
  83 Use @arv:APIRequirement@ in @hints@ to enable general (non-Arvados-specific) network access for a tool.
  84
  85 h2. cwltool:LoadListingRequirement
  86
  87 In CWL v1.0 documents, the default behavior for Directory objects is to recursively expand the @listing@ for access by parameter references an expressions.  For directory trees containing many files, this can be expensive in both time and memory usage.  Use @cwltool:LoadListingRequirement@ to change the behavior for expansion of directory listings in the workflow runner.
  88
  89 table(table table-bordered table-condensed).
  90 |_. Field |_. Type |_. Description |
  91 |loadListing|string|One of @no_listing@, @shallow_listing@, or @deep_listing@|
  92
  93 *no_listing*: Do not expand directory listing at all.  The @listing@ field on the Directory object will be undefined.
  94
  95 *shallow_listing*: Only expand the first level of directory listing.  The @listing@ field on the toplevel Directory object will contain the directory contents, however @listing@ will not be defined on subdirectories.
  96
  97 *deep_listing*: Recursively expand all levels of directory listing.  The @listing@ field will be provided on the toplevel object and all subdirectories.
  98
  99 h2. arv:IntermediateOutput
 100
 101 Specify desired handling of intermediate output collections.
 102
 103 table(table table-bordered table-condensed).
 104 |_. Field |_. Type |_. Description |
 105 |outputTTL|int|If the value is greater than zero, consider intermediate output collections to be temporary and should be automatically trashed. Temporary collections will be trashed @outputTTL@ seconds after creation.  A value of zero means intermediate output should be retained indefinitely (this is the default behavior).
 106 Note: arvados-cwl-runner currently does not take workflow dependencies into account when setting the TTL on an intermediate output collection. If the TTL is too short, it is possible for a collection to be trashed before downstream steps that consume it are started.  The recommended minimum value for TTL is the expected duration of the entire the workflow.|
 107
 108 h2. arv:ReuseRequirement
 109
 110 Enable/disable work reuse for current process.  Default true (work reuse enabled).
 111
 112 table(table table-bordered table-condensed).
 113 |_. Field |_. Type |_. Description |
 114 |enableReuse|boolean|Enable/disable work reuse for current process.  Default true (work reuse enabled).|
 115
 116 h2. cwltool:Secrets
 117
 118 Indicate that one or more input parameters are "secret".  Must be applied at the top level Workflow.  Secret parameters are not stored in keep, are hidden from logs and API responses, and are wiped from the database after the workflow completes.
 119
 120 table(table table-bordered table-condensed).
 121 |_. Field |_. Type |_. Description |
 122 |secrets|array<string>|Input parameters which are considered "secret".  Must be strings.|
 123
 124
 125 h2. cwltool:TimeLimit
 126
 127 Set an upper limit on the execution time of a CommandLineTool or ExpressionTool.  A tool execution which exceeds the time limit may be preemptively terminated and considered failed.  May also be used by batch systems to make scheduling decisions.
 128
 129 table(table table-bordered table-condensed).
 130 |_. Field |_. Type |_. Description |
 131 |timelimit|int|Execution time limit in seconds. If set to zero, no limit is enforced.|
 132
 133 h2. arv:WorkflowRunnerResources
 134
 135 Specify resource requirements for the workflow runner process (arvados-cwl-runner) that manages a workflow run.  Must be applied to the top level workflow.  Will also be set implicitly when using @--submit-runner-ram@ on the command line along with @--create-workflow@ or @--update-workflow@.  Use this to adjust the runner's allocation if the workflow runner is getting "out of memory" exceptions or being killed by the out-of-memory (OOM) killer.
 136
 137 table(table table-bordered table-condensed).
 138 |_. Field |_. Type |_. Description |
 139 |ramMin|int|RAM, in mebibytes, to reserve for the arvados-cwl-runner process. Default 1 GiB|
 140 |coresMin|int|Number of cores to reserve to the arvados-cwl-runner process. Default 1 core.|
 141 |keep_cache|int|Size of collection metadata cache for the workflow runner, in MiB.  Default 256 MiB.  Will be added on to the RAM request when determining node size to request.|
 142
 143 h2(#clustertarget). arv:ClusterTarget
 144
 145 Specify which Arvados cluster should execute a container or subworkflow, and the parent project for the container request.
 146
 147 table(table table-bordered table-condensed).
 148 |_. Field |_. Type |_. Description |
 149 |cluster_id|string|The five-character alphanumeric cluster id (uuid prefix) where a container or subworkflow will execute.  May be an expression.|
 150 |project_uuid|string|The uuid of the project which will own container request and output of the container.  May be an expression.|
 151
 152 h2. arv:dockerCollectionPDH
 153
 154 This is an optional extension field appearing on the standard @DockerRequirement@.  It specifies the portable data hash of the Arvados collection containing the Docker image.  If present, it takes precedence over @dockerPull@ or @dockerImageId@.
 155
 156 <pre>
 157 requirements:
 158   DockerRequirement:
 159     dockerPull: "debian:9"
 160     arv:dockerCollectionPDH: "feaf1fc916103d7cdab6489e1f8c3a2b+174"
 161 </pre>