doc/user/topics/run-command.html.textile.liquid

   1 ---
   2 layout: default
   3 navsection: userguide
   4 title: "run-command reference"
   5 ...
   6
   7 The @run-command@ crunch script enables you run command line programs.
   8
   9 h1. Using run-command
  10
  11 The basic run-command process evaluates its inputs and builds a command line, executes the command, and saves the contents of the output directory back to Keep.  For large datasets, run-command can schedule concurrent tasks to execute the wrapped program over a range of inputs (see @task.foreach@ below.)
  12
  13 Run-command is controlled through the script_parameters section of a pipeline component.  Script_parameters is a JSON object consisting of key-value pairs.  There are three categories of keys that are meaningful to run-command:
  14 * The @command@ section defining the template to build the command line of task
  15 * Special processing directives such as @task.foreach@ @task.cwd@ @task.vwd@ @task.stdin@ @task.stdout@
  16 * User-defined parameters (everything else)
  17
  18 h2. Command template
  19
  20 The value of the "command" key is a list.  The first parameter of the list is the actual program to invoke, followed by the commmand arguments.  The simplest run-command invocation simply runs a program with static parameters.  In this example, run "echo" with the first argument "hello":
  21
  22 <pre>
  23   "script_parameters": {
  24     "command": ["echo", "hello world"]
  25   }
  26 </pre>
  27
  28 Running this job will print "hello world" to the job log.
  29
  30 By default, the command will start with the current working directory set to the output directory.  Anything written to the output directory will be saved to Keep when the command is finished.  You can change the default working directory using the @task.cwd@ and get the path to the output directory using @$(task.outdir)@ as explained below.
  31
  32 Items in the "command" list may include lists and objects in addition to strings.  Lists are flattened to produce the final command line.  JSON objects are evaluted as list item functions (see below).  For example, the following evalutes to ["echo", "hello", "world"]:
  33
  34 <pre>
  35   "script_parameters": {
  36     "command": ["echo", ["hello", "world"]]
  37   }
  38 </pre>
  39
  40 h2. Parameter substitution
  41
  42 The "command" list can include parameter substitutions.  Substitutions are enclosed in "$(...)" and may contain the name of a user-defined parameter.  In the following example, the value of "a" is "hello world"; so when "command" is evaluated, it will substitute "hello world" for "$(a)":
  43
  44 <pre>
  45 "script_parameters": {
  46   "command": ["echo", "$(a)"],
  47   "a": "hello world"
  48 }
  49 </pre>
  50
  51 h2. Special parameters
  52
  53 In addition to user-defined parameters, there are special parameters supplied by run-command that provide some information about the runtime environment.
  54
  55 table(table table-bordered table-condensed).
  56 |_. Parameter   |_. Value |
  57 |$(node.cores)     |Number of cores on the current node|
  58 |$(task.tmpdir)    |Path to the temporary directory for this task      |
  59 |$(task.outdir)    |Path to the task's designated output directory.  This |
  60 |$(task.uuid)      |The current task's unique identifier      |
  61 |$(job.srcdir)     |The directory containing the source code for the run-command- script      |
  62 |$(job.uuid)       |The current job's unique identifier      |
  63
  64 h2. Substitution functions
  65
  66 Substitutions can also make use of functions.  Functions take a single parameter and substitution is performed recursively from the inside out.  In the following example, the parameter $(a) is evaluated first, then the $(file ...) function applied to get a local filesystem path, to produce a command like @["echo", "/path/to/keep/mount/c1bad4b39ca5a924e481008009d94e32+210/var-GS000016015-ASM.tsv.bz2"]@:
  67
  68 <pre>
  69 "script_parameters": {
  70   "command": ["echo", "$(file $(a))"],
  71   "a": "c1bad4b39ca5a924e481008009d94e32+210/var-GS000016015-ASM.tsv.bz2"
  72 }
  73 </pre>
  74
  75 table(table table-bordered table-condensed).
  76 |_. Function|_. Action|
  77 |$(file ...)       | Takes a reference to a file within an Arvados collection and evaluates to a file path on the local file system where that file can be accessed by your command.  Will raise an error if the file is not accessable.|
  78 |$(dir ...)        | Takes a reference to an Arvados collection or directory within an Arvados collection and evaluates to a directory path on the local file system where that directory can be accessed by your command.  The path may include a file name, in which case it will evaluate to the parent directory of the file.  Uses Python's os.path.dirname(), so "/foo/bar" will evaluate to "/foo" but "/foo/bar/" will evaluate to "/foo/bar".  Will raise an error if the directory is not accessable. |
  79 |$(basename&nbsp;...)   | Strip leading directory and trailing file extension from the path provided.  For example, $(basename /foo/bar.baz.txt) will evaluate to "bar.baz".|
  80 |$(glob ...)       | Take a unix shell path pattern (supports @*@ @?@ and @[]@) and search the local filesystem, returning the first match found.  Use together with $(dir ...) to get a local filesystem path for Arvados collections.  For example: $(glob $(dir $(mycollection)/*.bam)) will find the first .bam file in the collection specified by the user parameter "mycollection".  If there is more than one match, which one is returned is undefined.  Will raise an error if no matches are found.|
  81
  82 h2. List context
  83
  84 When a parameter is evaluted in a list context, that means its value should evaluate to a list instead of a string.  Parameter values can be a static list (as demonstrated above), a path to a file, a path to a directory, or a JSON object describing a list context function.
  85
  86 If the value is a static list, it will evaluate the list items for parameter substition and list functions.
  87
  88 If the value is a string, it is interpreted as a path.  If the path specifies a regular file, that file will be opened as a text file and produce a list with one item for each line in the file (end-of-line characters will be stripped).  If the path specifies a directory, produce a list containing all of the entries in the directory.  Note that parameter expansion is not performed lists produced this way.
  89
  90 If the value is a JSON object, it is evaluated as a list function described below.
  91
  92 h2. List functions
  93
  94 When run-command is evaluating a list (such as "command"), in addition to string parameter substitution, you can use list items functions.
  95
  96 h3. foreach
  97
  98 The @foreach@ list item function (not to be confused with the @task.foreach@ directive) expands a command template for each item in the specified user parameter (the value of the user parameter is evaluated in a list context, as described below).  The following example will evaluate "command" to @["echo", "--something", "alice", "--something", "bob"]@:
  99
 100 <pre>
 101 "script_parameters": {
 102   "command": ["echo", {"foreach": "a", "command": ["--something", "$(a)"]}],
 103   "a": ["alice", "bob"]
 104 }
 105 </pre>
 106
 107 h3. index
 108
 109 The "index" list item function extracts a single item from a list.  The "index" is zero-based (i.e. the first item is at index 0, the second item index 1, etc).  The following example will evaluate "command" to ["echo", "--something", "bob"]:
 110
 111 <pre>
 112 "script_parameters": {
 113   "command": ["echo", {"list": "a", index: 1, "command": ["--something", "$(a)"]}],
 114   "a": ["alice", "bob"]
 115 }
 116 </pre>
 117
 118 h3. filter
 119
 120 Filter the list so that it only includes items that match a regular expression.  The following example will evaluate to @["echo", "bob"]@
 121
 122 <pre>
 123 "script_parameters": {
 124   "command": ["echo", {"filter": "a", regex: "b.*"]}],
 125   "a": ["alice", "bob"]
 126 }
 127 </pre>
 128
 129 h3. group
 130
 131 Generate a list of lists, where items are grouped on common subexpression match.  Items which don't match the regular expression are excluded.  The following example evaluates to @["echo", "--group", "alice", "carol", "dave", "--group", "bob"]@:
 132
 133 <pre>
 134 "script_parameters": {
 135   "command": ["echo", {"foreach": {"group": "a", regex: ".*(a?).*"]}, "command":["--group", {"foreach": "a", "command":"$(a)"}]],
 136   "a": ["alice", "bob", "carol", "dave"]
 137 }
 138 </pre>
 139
 140 h3. extract
 141
 142 Generate a list of lists, where items are split by subexpression match.  Items which don't match the regular expression are excluded.  The following example evaluates to @["echo", "c", "a", "rol", "d", "a", "ve"]@:
 143
 144 <pre>
 145 "script_parameters": {
 146   "command": ["echo", {"foreach": {"extract": "a", regex: "(.+)(a)(.*)"]}, "command":[{"foreach": "a", "command":"$(a)"}]],
 147   "a": ["alice", "bob", "carol", "dave"]
 148 }
 149 </pre>
 150
 151 h2. Directives
 152
 153 Directives alter the behavior of run-command.  All directives are optional.
 154
 155 h3. task.cwd
 156
 157 This directive sets the initial current working directory that your command will run in.  If @task.cwd@ is not specified, the default current working directory is @task.outdir@.
 158
 159 h3. task.stdin and task.stdout
 160
 161 Provide standard input and standard output redirection.
 162
 163 @task.stdin@ must evalute to a path to a file to be bound to the commands's standard input stream.
 164
 165 @task.stdout@ specifies the desired file name in the output directory to save the content of standard output.
 166
 167 h3. task.vwd
 168
 169 Background: because Keep collections are read-only, this does not play well with certain tools that expect to be able to write their outputs alongside their inputs (such as tools that generate indexes that are closely associated with the original file.)  The run-command's solution to this is the "virtual working directory".
 170
 171 @task.vwd@ specifies a Keep collection with the starting contents of the directory.  Run-command will then populate @task.outdir@ with directories and symlinks to mirror the contents of the @task.vwd@ collection.  Your command will then be able to both access its input files and write it output files in @task.outdir@.  When the command completes, the output collection will merge the output of your command with the contents of the starting collection.  Note that files in the starting collection remain read-only and cannot be altered or deleted.
 172
 173 h3. task.foreach
 174
 175 Using @task.foreach@, you can run your command concurrently over large datasets.
 176
 177 @task.foreach@ takes the names of one or more user-defined parameters.  The value of these parameters are evaluated in a list context.  Run-command then generates tasks based on the cartesian product (i.e. all combinations) of the input lists.  The outputs of all tasks are merged to create the final output collection.  Note that if two tasks output a file in the same directory with the same name, that file will be concatinated in the final output.  In the following example, three tasks will be created for the "grep" command, based on the contents of user parameter "a":
 178
 179 <pre>
 180 "script_parameters": {
 181   "command": ["echo", "$(a)"],
 182   "task.foreach": "a",
 183   "a": ["alice", "bob", "carol"]
 184 }
 185 </pre>
 186
 187 This evaluates to the commands:
 188 <notextile>
 189 <pre>
 190 ["echo", "alice"]
 191 ["echo", "bob"]
 192 ["echo", "carol"]
 193 </pre>
 194 </notextile>
 195
 196 You can also specify multiple parameters:
 197
 198 <pre>
 199 "script_parameters": {
 200   "command": ["echo", "$(a)", "$(b)"],
 201   "task.foreach": ["a", "b"],
 202   "a": ["alice", "bob"],
 203   "b": ["carol", "dave"]
 204 }
 205 </pre>
 206
 207 This evaluates to the commands:
 208
 209 <pre>
 210 ["echo", "alice", "carol"]
 211 ["echo", "alice", "dave"]
 212 ["echo", "bob", "carol"]
 213 ["echo", "bob", "dave"]
 214 </pre>
 215
 216 h1. Examples
 217
 218 <notextile>{% code 'run_command_simple_example' as javascript %}</notextile>
 219
 220 <notextile>{% code 'run_command_foreach_example' as javascript %}</notextile>