X-Git-Url: https://git.arvados.org/arvados.git/blobdiff_plain/987c225f9f6845868ee674902090c27a5a064f42..78889e115e6fffd5eb82e54a541bd4858f804f91:/doc/user/tutorials/running-external-program.html.textile.liquid

diff --git a/doc/user/tutorials/running-external-program.html.textile.liquid b/doc/user/tutorials/running-external-program.html.textile.liquid
index 18f5f7d35f..a7682594c9 100644
--- a/doc/user/tutorials/running-external-program.html.textile.liquid
+++ b/doc/user/tutorials/running-external-program.html.textile.liquid
@@ -4,6 +4,8 @@ navsection: userguide
 title: "Writing a pipeline template"
 ...
 
+{% include 'pipeline_deprecation_notice' %}
+
 This tutorial demonstrates how to construct a two stage pipeline template that uses the "bwa mem":http://bio-bwa.sourceforge.net/ tool to produce a "Sequence Alignment/Map (SAM)":https://samtools.github.io/ file, then uses the "Picard SortSam tool":http://picard.sourceforge.net/command-line-overview.shtml#SortSam to produce a BAM (Binary Alignment/Map) file.
 
 {% include 'tutorial_expectations' %}
@@ -20,13 +22,14 @@ This will open the template record in an interactive text editor (as specified b
 
 * @"name"@ is a human-readable name for the pipeline.
 * @"components"@ is a set of scripts or commands that make up the pipeline.  Each component is given an identifier (@"bwa-mem"@ and @"SortSam"@) in this example).
-** Each entry in components @"components"@ is an Arvados job submission.  For more information about individual jobs, see the "job object reference":{{site.baseurl}}/api/schema/Job.html and "job create method.":{{site.baseurl}}/api/methods/jobs.html#create
-* @"repository"@, @"script_version"@, and @"script"@ indicate that we intend to use the external @"run-command"@ tool wrapper that is part of the Arvados.  These parameters are described in more detail in "Writing a script":tutorial-firstscript.html
+** Each entry in components @"components"@ is an Arvados job submission.  For more information about individual jobs, see the "job resource reference.":{{site.baseurl}}/api/methods/jobs.html
+* @"repository"@, @"script_version"@, and @"script"@ indicate that we intend to use the external @"run-command"@ tool wrapper that is part of the Arvados.  These parameters are described in more detail in "Writing a script":tutorial-firstscript.html.
 * @"runtime_constraints"@ describes runtime resource requirements for the component.
-** @"docker_image"@ specifies the "Docker":https://www.docker.com/ runtime environment in which to run the job.  The Docker image @"arvados/jobs-java-bwa-samtools"@ supplied here has the Arvados SDK, Java runtime environment, bwa, and samtools installed.
+** @"docker_image"@ specifies the "Docker":https://www.docker.com/ runtime environment in which to run the job.  The Docker image @"bcosc/arv-base-java"@ supplied here has the Java runtime environment, bwa, and samtools installed.
+** @"arvados_sdk_version"@ specifies a version of the Arvados SDK to load alongside the job's script. The example uses 'master'. If you would like to use a specific version of the sdk, you can find it in the "Arvados Python sdk repository":https://dev.arvados.org/projects/arvados/repository/revisions/master/show/sdk/python under *Latest revisions*.
 * @"script_parameters"@ describes the component parameters.
 ** @"command"@ is the actual command line to invoke the @bwa@ and then @SortSam@.  The notation @$()@ denotes macro substitution commands evaluated by the run-command tool wrapper.
-** @"stdout"@ indicates that the output of this command should be captured to a file.
+** @"task.stdout"@ indicates that the output of this command should be captured to a file.
 ** @$(node.cores)@ evaluates to the number of cores available on the compute node at time the command is run.
 ** @$(tmpdir)@ evaluates to the local path for temporary directory the command should use for scratch data.
 ** @$(reference_collection)@ evaluates to the script_parameter @"reference_collection"@
@@ -34,19 +37,44 @@ This will open the template record in an interactive text editor (as specified b
 ** @$(file $(...))@ constructs a local path to a given file within the supplied Arvados collection.
 ** @$(glob $(...))@ searches the specified path based on a file glob pattern and evalutes to the first result.
 ** @$(basename $(...))@ evaluates to the supplied path with leading path portion and trailing filename extensions stripped
-** @"output_of"@ indicates that the @output@ of the @bwa-mem@ component should be used as the @"input"@ of @SortSam@.  Arvados uses these dependencies between components to automatically determine the correct order to run them.
+* @"output_of"@ indicates that the @output@ of the @bwa-mem@ component should be used as the @"input"@ script parameter of @SortSam@.  Arvados uses these dependencies between components to automatically determine the correct order to run them.
 
 When using @run-command@, the tool should write its output to the current working directory.  The output will be automatically uploaded to Keep when the job completes.
 
 See the "run-command reference":{{site.baseurl}}/user/topics/run-command.html for more information about using @run-command@.
 
+*Note:* When trying to get job reproducibility without re-computation, you need to set these parameters to their specific hashes. Using a version such as master in @"arvados_sdk_version"@ will grab the latest version hash, which will allow Arvados to re-compute your job if the sdk gets updated.
+* @"arvados_sdk_version"@ : The latest version can be found on the "Arvados Python sdk repository":https://dev.arvados.org/projects/arvados/repository/revisions/master/show/sdk/python under *Latest revisions*.
+* @"script_version"@ : The current version of your script in your git repository can be found by using the following command:
+
+<notextile>
+<pre><code>~$ <span class="userinput">git rev-parse HEAD</span></code></pre>
+</notextile>
+
+* @"docker_image"@ : The docker image hash used is found on the "Collection page":https://cloud.curoverse.com/collections/qr1hi-4zz18-dov6im679g3jr1n as the *Content address*.
+
 h2. Running your pipeline
 
-Your new pipeline template should appear at the top of the Workbench "pipeline&nbsp;templates":https://{{ site.arvados_workbench_host }}/pipeline_templates page.  You can run your pipeline "using Workbench":tutorial-pipeline-workbench.html or the "command line.":{{site.baseurl}}/user/topics/running-pipeline-command-line.html
+Your new pipeline template should appear at the top of the Workbench "pipeline&nbsp;templates":{{site.arvados_workbench_host}}/pipeline_templates page.  You can run your pipeline "using Workbench":tutorial-pipeline-workbench.html or the "command line.":{{site.baseurl}}/user/topics/running-pipeline-command-line.html
+
+Test data is available in the "Arvados Tutorial":{{site.arvados_workbench_host}}/projects/qr1hi-j7d0g-u7zg1qdaowykd8d project:
 
-Test data is available in the "Arvados Tutorial":https://{{ site.arvados_workbench_host }}/projects/qr1hi-j7d0g-u7zg1qdaowykd8d project:
+* Choose <i class="fa fa-fw fa-archive"></i> "Tutorial chromosome 19 reference (2463fa9efeb75e099685528b3b9071e0+438)":{{site.arvados_workbench_host}}/collections/2463fa9efeb75e099685528b3b9071e0+438 for the "reference_collection" parameter
+* Choose <i class="fa fa-fw fa-archive"></i> "Tutorial sample exome (3229739b505d2b878b62aed09895a55a+142)":{{site.arvados_workbench_host}}/collections/3229739b505d2b878b62aed09895a55a+142 for the "sample" parameter
 
-* Choose <i class="fa fa-fw fa-archive"></i> "Tutorial chromosome 19 reference (2463fa9efeb75e099685528b3b9071e0+438)":https://{{ site.arvados_workbench_host }}/collections/2463fa9efeb75e099685528b3b9071e0+438 for the "reference_collection" parameter
-* Choose <i class="fa fa-fw fa-archive"></i> "Tutorial sample exome (3229739b505d2b878b62aed09895a55a+142)":https://{{ site.arvados_workbench_host }}/collections/3229739b505d2b878b62aed09895a55a+142 for the "sample" parameter
+For more information and examples for writing pipelines, see the "pipeline template reference":{{site.baseurl}}/api/methods/pipeline_templates.html
+
+h2. Re-using your pipeline run
+
+Arvados allows users to re-use jobs that have the same inputs in order to save computing time and resources. Users are able to change a job downstream without re-computing earlier jobs. This section shows which version control parameters should be tuned to make sure Arvados will not re-compute your jobs.
+
+Note: Job reuse can only happen if all input collections do not change.
+
+* @"arvados_sdk_version"@ : The arvados_sdk_version parameter is used to download the specific version of the Arvados sdk into the docker image. The latest version can be found in the "Arvados Python sdk repository":https://dev.arvados.org/projects/arvados/repository/revisions/master/show/sdk/python under *Latest revisions*. Make sure you set this to the same version as the previous run that you are trying to reuse.
+* @"script_version"@ : The script_version is the commit hash of the git branch that the crunch script resides in. This information can be found in your git repository by using the following command:
+
+<notextile>
+<pre><code>~$ <span class="userinput">git rev-parse HEAD</span></code></pre>
+</notextile>
 
-For more information and examples for writing pipelines, see the "pipeline template reference":{{site.baseurl}}/api/schema/PipelineTemplate.html
+* @"docker_image"@ : This specifies the "Docker":https://www.docker.com/ runtime environment where jobs run their scripts. Docker version control is similar to git, and you can commit and push changes to your images. You must re-use the docker image hash from the previous run to use the same image. It can be found on the "Collection page":https://cloud.curoverse.com/collections/qr1hi-4zz18-dov6im679g3jr1n as the *Content address* or the *docker_image_locator* in a job's metadata.