---
layout: default
navsection: userguide
-title: "List of examples supplied with Arvados"
+title: "List of examples included with Arvados"
...
Several crunch scripts are included with Arvados in the "/crunch_scripts directory":https://arvados.org/projects/arvados/repository/revisions/master/show/crunch_scripts. They are intended to provide examples and starting points for writing your own scripts.
title: Accessing Arvados Workbench
...
-If you are using the default Arvados instance in this guide, you can Access Arvados Workbench using this link:
+If you are using the default Arvados instance for this guide, you can Access Arvados Workbench using this link:
<a href="https://{{ site.arvados_workbench_host }}/" target="_blank">https://{{ site.arvados_workbench_host }}/</a>
-(If you are using a different Arvados instance than the default in this guide, replace *{{ site.arvados_workbench_host }}* with your private instance in all of the examples in this guide.)
+(If you are using a different Arvados instance than the default for this guide, replace *{{ site.arvados_workbench_host }}* with your private instance in all of the examples in this guide.)
You may be asked to log in using a Google account. Arvados uses only your name and email address from Google services for identification, and will never access any personal information. If you are accessing Arvados for the first time, the Workbench may indicate your account status is *New / inactive*. If this is the case, contact the administrator of the Arvados instance to request activation of your account.
{% include 'tutorial_expectations' %}
-In this tutorial, we will run a pipeline to take a small data set of paired-end reads from an sample "exome":https://en.wikipedia.org/wiki/Exome in "FASTQ":https://en.wikipedia.org/wiki/FASTQ_format format and align them to "Chromosome 19":https://en.wikipedia.org/wiki/Chromosome_19_%28human%29 using the "bwa mem":http://bio-bwa.sourceforge.net/ tool, producing a "Sequence Alignment/Map (SAM)":https://samtools.github.io/ file.
+This tutorial demonstrates how to use the command line to run the same pipeline as described in "running a pipeline using Workbench.":{{site.baseurl}}/user/tutorials/tutorial-pipeline-workbench.html
When you use the command line, you must use Arvados unique identifiers to refer to objects. The identifiers in this example correspond to the following Arvados objects:
Arvados adds each pipeline component to the job queue as its dependencies are satisfied (or immediately if it has no dependencies) and finishes when all components are completed or failed and there is no more work left to do.
-The Keep locators of the output of of the @bwa-mem@ components are available from the output log shown above:
+The Keep locators of the output of of the @bwa-mem@ components are available from the status feed shown above:
<notextile>
<pre><code>~$ <span class="userinput">arv keep ls -s 49bae1066f4ebce72e2587a3efa61c7d+88</span>
h2. Prerequisites
+The Arvados "Crunch" framework is designed to support processing very large data batches (gigabytes to terabytes) efficiently, and provides the following benefits:
+* Increase concurrency by running tasks asynchronously, using many CPUs and network interfaces at once (especially beneficial for CPU-bound and I/O-bound tasks respectively).
+* Track inputs, outputs, and settings so you can verify that the inputs, settings, and sequence of programs you used to arrive at an output is really what you think it was.
+* Ensure that your programs and workflows are repeatable with different versions of your code, OS updates, etc.
+* Interrupt and resume long-running jobs consisting of many short tasks.
+* Maintain timing statistics automatically, so they're there when you want them.
+
To get the most value out of this guide, you should be comfortable with the following:
# Using a secure shell client such as SSH or PuTTY to log on to a remote server
# Revision control using Git
We also recommend you read the "Arvados Platform Overview":https://arvados.org/projects/arvados/wiki#Platform-Overview for an introduction and background information about Arvados.
-
-The Arvados "Crunch" framework is designed to support processing very large data batches (gigabytes to terabytes) efficiently, and provides the following benefits:
-* Increase concurrency by running tasks asynchronously, using many CPUs and network interfaces at once (especially beneficial for CPU-bound and I/O-bound tasks respectively).
-* Track inputs, outputs, and settings so you can verify that the inputs, settings, and sequence of programs you used to arrive at an output is really what you think it was.
-* Ensure that your programs and workflows are repeatable with different versions of your code, OS updates, etc.
-* Interrupt and resume long-running jobs consisting of many short tasks.
-* Maintain timing statistics automatically, so they're there when you want them.
{% include 'tutorial_expectations' %}
-First, use @arv pipeline_template create@ to create a new empty template. The @--format=uuid@ option will print out the unique identifier for the new template:
+Use the following command to create a new empty template using @arv pipeline_template create@ and then open the template record in an interactive text editor (as specified by $EDITOR or $VISUAL, otherwise defaults to @nano@) using @arv edit@.
<notextile>
-<pre><code>~$ <span class="userinput">arv --format=uuid pipeline_template create --pipeline-template '{}'</span>
-qr1hi-p5p6p-wt1vdhkezgx7g2k
-</span></code></pre>
+<pre><code>~$ <span class="userinput">arv edit $(arv --format=uuid pipeline_template create --pipeline-template '{}') name components </span></code></pre>
</notextile>
-Next, use @arv edit@ to edit the template. This will open the template record in an interactive text editor (as specified by $EDITOR or $VISUAL, otherwise defaults to @nano@). Replace the empty fields with the following content:
+* @--format=uuid@ option prints out just the unique identifier for the new template, instead of the entire template record (default)
+
+Next, in the text editor opened by @arv edit@ replace the empty fields with the following content:
<notextile>{% code 'tutorial_bwa_sortsam_pipeline' as javascript %}</notextile>
* @"name"@ is a human-readable name for the pipeline.
* @"components"@ is a set of scripts or commands that make up the pipeline. Each component is given an identifier (@"bwa-mem"@ and @"SortSam"@) in this example).
+** Each entry in components @"components"@ is an Arvados job submission. For more information about individual jobs, see the "job object reference":{{site.baseurl}}/api/schema/Job.html and "job create method.":{{site.baseurl}}/api/methods/jobs.html#create
* @"repository"@, @"script_version"@, and @"script"@ indicate that we intend to use the external @"run-command"@ tool wrapper that is part of the Arvados. These parameters are described in more detail in "Writing a script":tutorial-firstscript.html
* @"output_is_persistent"@ indicates whether the output of the component is considered valuable. If this value is false (or not given), the output will be treated as intermediate data which may be eventually deleted to reclaim disk space.
* @"runtime_constraints"@ describes runtime resource requirements for the component.
When using @run-command@, the tool should write its output to the current working directory. The output will be automatically uploaded to Keep when the job completes.
-Your new pipeline template will appear on the Workbench "Pipeline templates":https://{{ site.arvados_workbench_host }}/pipeline_templates page. You can run the "pipeline using Workbench":tutorial-pipeline-workbench.html.
+h2. Running your pipeline
+
+Your new pipeline template should appear at the top of the Workbench "pipeline templates":https://{{ site.arvados_workbench_host }}/pipeline_templates page. You can run your pipeline "using Workbench":tutorial-pipeline-workbench.html or the "command line.":{{site.baseurl}}/user/topics/running-pipeline-command-line.html
For more information and examples for writing pipelines, see the "pipeline template reference":{{site.baseurl}}/api/schema/PipelineTemplate.html
title: "Writing a script"
...
-This tutorial demonstrates how to write crunch script using the Arvados Python SDK. The Arvados SDK supports access to advanced features not available using the @run-command@ wrapper, such as scheduling parallel tasks across nodes.
+This tutorial demonstrates how to create a new Arvados pipeline using the Arvados Python SDK. The Arvados SDK supports access to advanced features not available using the @run-command@ wrapper, such as scheduling parallel tasks across nodes.
{% include 'tutorial_expectations' %}
notextile. <pre><code>~/$USER/crunch_scripts$ <span class="userinput">chmod +x hash.py</span></code></pre>
{% include 'notebox_begin' %}
-The steps below describe how to execute the script after committing changes to Git. To run a script locally for testing, please see "debugging a crunch script":{{site.baseurl}}/user/topics/tutorial-job-debug.html.
+The steps below describe how to execute the script after committing changes to Git. To run a single script locally for testing (bypassing the job queue) please see "debugging a crunch script":{{site.baseurl}}/user/topics/tutorial-job-debug.html.
{% include 'notebox_end' %}
<pre><code>~/$USER/crunch_scripts$ <span class="userinput">cd ~</span>
~$ <span class="userinput">cat >the_pipeline <<EOF
{
- "name":"My first pipeline",
+ "name":"My md5 pipeline",
"components":{
"do_hash":{
"script":"hash.py",
</code></pre>
</notextile>
-Your new pipeline template will appear on the Workbench "Pipeline templates":https://{{ site.arvados_workbench_host }}/pipeline_templates page. You can run the "pipeline using Workbench":tutorial-pipeline-workbench.html.
+h2. Running your pipeline
+
+Your new pipeline template should appear at the top of the Workbench "pipeline templates":https://{{ site.arvados_workbench_host }}/pipeline_templates page. You can run your pipeline "using Workbench":tutorial-pipeline-workbench.html or the "command line.":{{site.baseurl}}/user/topics/running-pipeline-command-line.html
For more information and examples for writing pipelines, see the "pipeline template reference":{{site.baseurl}}/api/schema/PipelineTemplate.html
title: "Getting data from Keep"
...
-This tutorial covers using @arv-ls@ and @arv-get@ to access Keep from the command line. It is also possible to download a file from a collection from the Workbench page for the collection.
+This tutorial covers using @arv-ls@ and @arv-get@ to access Keep from the command line. It is also possible to download a file from a collection from the Workbench page for the collection, covered in "running a pipeline using Workbench":{{site.baseurl}}/user/tutorials/tutorial-pipeline-workbench.html
{% include 'tutorial_expectations' %}
</code></pre>
</notextile>
-The output value @c1bad4b39ca5a924e481008009d94e32+210@ is the Arvados collection locator that uniquely describes this file. In order to place your newly uploaded file into a Project, visit the workbench page for your new collection: <a href="https://{{ site.arvados_workbench_host }}/collections/c1bad4b39ca5a924e481008009d94e32+210" target="_blank">https://{{ site.arvados_workbench_host }}/collections/c1bad4b39ca5a924e481008009d94e32+210</a>. On the workbench page for the collection, click on <span class="btn btn-xs btn-primary" ><i class="fa fa-fw fa-folder"></i> Choose a project...</span> to open a modal dialog allowing you to select a destination project for your collection.
+The output value @c1bad4b39ca5a924e481008009d94e32+210@ is the Arvados collection locator that uniquely describes this file. In order to place your newly uploaded file into a Project, visit the workbench page for your new collection: <a href="https://{{ site.arvados_workbench_host }}/collections/c1bad4b39ca5a924e481008009d94e32+210" target="_blank">https://{{ site.arvados_workbench_host }}/collections/c1bad4b39ca5a924e481008009d94e32+210</a>. On that page, click on <span class="btn btn-xs btn-primary" ><i class="fa fa-fw fa-folder"></i> Choose a project...</span> to open a modal dialog allowing you to select a destination project for your collection.
notextile. </div>
title: "Running a pipeline using Workbench"
...
-
-In this tutorial, we will run a pipeline to take a small data set of paired-end reads from an sample "exome":https://en.wikipedia.org/wiki/Exome in "FASTQ":https://en.wikipedia.org/wiki/FASTQ_format format and align them to "Chromosome 19":https://en.wikipedia.org/wiki/Chromosome_19_%28human%29 using the "bwa mem":http://bio-bwa.sourceforge.net/ tool, producing a "Sequence Alignment/Map (SAM)":https://samtools.github.io/ file. This will introduce the following Arvados features:
+A "pipeline" (sometimes called a "workflow" in other systems) is sequence of steps that apply various programs or tools to transform input data to output data. Pipelines are the principal means of performing computation with Arvados. This tutorial demonstrates how to run a single-stage pipeline to take a small data set of paired-end reads from an sample "exome":https://en.wikipedia.org/wiki/Exome in "FASTQ":https://en.wikipedia.org/wiki/FASTQ_format format and align them to "Chromosome 19":https://en.wikipedia.org/wiki/Chromosome_19_%28human%29 using the "bwa mem":http://bio-bwa.sourceforge.net/ tool, producing a "Sequence Alignment/Map (SAM)":https://samtools.github.io/ file. This will introduce the following Arvados features:
<div class="inside-list">
* How to create a project.
-* How to submit a pipeline to run on the Arvados cluster.
+* How to browse available pipeline templates and create a new pipeline from an existing template.
+* How to browse and select input data for the pipeline and submit the pipeline to run on the Arvados cluster.
* How to access your pipeline results.
</div>