-$ git commit -m"my first script"
+~/you/crunch_scripts$ git commit -m"my first script"
[master (root-commit) 27fd88b] my first script
1 file changed, 33 insertions(+)
create mode 100755 crunch_scripts/hash.py
@@ -82,7 +84,7 @@ Next, commit your changes to git. All staged changes are recorded into the loca
Finally, upload your changes to the Arvados server:
-$ git push origin master
+~/you/crunch_scripts$ git push origin master
Counting objects: 4, done.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (4/4), 682 bytes, done.
@@ -91,34 +93,51 @@ To git@git.qr1hi.arvadosapi.com:you.git
* [new branch] master -> master
-You should now be able to run your script using Crunch, similar to how we did it in the "first tutorial.":tutorial-job1.html The field @"script_version"@ should be @you:master@ to tell Crunch to run the script at the head of the "master" git branch, which you just uploaded.
+h2. Create a pipeline template
+
+Next, create a file that contains the pipeline definition:
-$ cat >the_job <<EOF
-{
- "script": "hash.py",
- "script_version": "you:master",
- "script_parameters":
- {
- "input": "c1bad4b39ca5a924e481008009d94e32+210"
- }
-}
-EOF
-$ arv -h job create --job "$(cat the_job)"
-{
- ...
- "uuid":"qr1hi-xxxxx-xxxxxxxxxxxxxxx"
- ...
-}
-$ arv -h job get --uuid qr1hi-xxxxx-xxxxxxxxxxxxxxx
+~/you/crunch_scripts$ cd ~
+~$ cat >the_pipeline <<EOF
{
- ...
- "output":"880b55fb4470b148a447ff38cacdd952+54",
- ...
+ "name":"My first pipeline",
+ "components":{
+ "do_hash":{
+ "script":"hash.py",
+ "script_parameters":{
+ "input":{
+ "required": true,
+ "dataclass": "Collection"
+ }
+ },
+ "repository":"you",
+ "script_version":"master",
+ "output_is_persistent":true
+ }
+ }
}
-$ arv keep get 880b55fb4470b148a447ff38cacdd952+54/md5sum.txt
-44b8ae3fde7a8a88d2f7ebd237625b4f var-GS000016015-ASM.tsv.bz2
+EOF
+
+
+
+* @cat@ is a standard Unix utility that simply copies standard input to standard output
+* @<the_pipeline@ redirects standard output to a file called @the_pipeline@
+* @"name"@ is a human-readable name for the pipeline
+* @"components"@ is a set of scripts that make up the pipeline
+* The component is listed with a human-readable name (@"do_hash"@ in this example)
+* @"script"@ specifies the name of the script to run. The script is searched for in the "crunch_scripts/" subdirectory of the @git@ checkout specified by @"script_version"@.
+* @"repository"@ is the git repository to search for the script version. You can access a list of available @git@ repositories on the Arvados workbench under "Compute %(rarr)→% Code repositories":https://{{site.arvados_workbench_host}}//repositories .
+* @"script_version"@ specifies the version of the script that you wish to run. This can be in the form of an explicit @git@ revision hash, a tag, or a branch (in which case it will take the HEAD of the specified branch). Arvados logs the script version that was used in the run, enabling you to go back and re-run any past job with the guarantee that the exact same code will be used as was used in the previous run.
+* @"script_parameters"@ describes the parameters for the script. In this example, there is one parameter called @input@ which is @required@ and is a @Collection@.
+* @"output_is_persistent"@ indicates whether the output of the job is considered valuable. If this value is false (or not given), the output will be treated as intermediate data and eventually deleted to reclaim disk space.
+
+Now, use @arv pipeline_template create@ tell Arvados about your pipeline template:
+
+
+~$ arv pipeline_template create --pipeline-template "$(cat the_pipeline)"
-Next, "debugging a crunch script.":tutorial-job-debug.html
+Your new pipeline template will appear on the "Workbench %(rarr)→% Compute %(rarr)→% Pipeline templates":https://{{ site.arvados_workbench_host }}/pipeline_instances page. You can run the "pipeline using workbench":tutorial-pipeline-workbench.html