X-Git-Url: https://git.arvados.org/arvados.git/blobdiff_plain/b91db14a4dced9d6ea124e86be3c796e6f2c8e8c..fd3278f525ea0afa54d3eef8c9705f9b44d629af:/doc/user/tutorials/tutorial-pipeline-workbench.html.textile.liquid diff --git a/doc/user/tutorials/tutorial-pipeline-workbench.html.textile.liquid b/doc/user/tutorials/tutorial-pipeline-workbench.html.textile.liquid index 277b9664e9..8dad6ab25e 100644 --- a/doc/user/tutorials/tutorial-pipeline-workbench.html.textile.liquid +++ b/doc/user/tutorials/tutorial-pipeline-workbench.html.textile.liquid @@ -4,21 +4,26 @@ navsection: userguide title: "Running a pipeline using Workbench" ... +A "pipeline" (sometimes called a "workflow" in other systems) is a sequence of steps that apply various programs or tools to transform input data to output data. Pipelines are the principal means of performing computation with Arvados. This tutorial demonstrates how to run a single-stage pipeline to take a small data set of paired-end reads from a sample "exome":https://en.wikipedia.org/wiki/Exome in "FASTQ":https://en.wikipedia.org/wiki/FASTQ_format format and align them to "Chromosome 19":https://en.wikipedia.org/wiki/Chromosome_19_%28human%29 using the "bwa mem":http://bio-bwa.sourceforge.net/ tool, producing a "Sequence Alignment/Map (SAM)":https://samtools.github.io/ file. This tutorial will introduce the following Arvados features: + +
+* How to create a new pipeline from an existing template. +* How to browse and select input data for the pipeline and submit the pipeline to run on the Arvados cluster. +* How to access your pipeline results. +
+ notextile.
-# Go to "Collections":https://{{ site.arvados_workbench_host }}/collections (*Data* %(rarr)→% *Collections (data files)*). -# On the Collections page, go to the search box and search for "tutorial". -# The results should include a collection with the contents *var-GS000016015-ASM.tsv.bz2*. -# Click on the check box to the left of *var-GS000016015-ASM.tsv.bz2*. This puts the collection in your persistent selection list. You can click on the paperclip in the upper right to review your current selections. -# Go to "Pipeline templates":https://{{ site.arvados_workbench_host }}/pipeline_templates (*Compute* %(rarr)→% *Pipeline templates*). -# Look for a pipeline named *Tutorial pipeline*. -# Click on the play button to the left of *Tutorial pipeline*. This will take you to a new page to configure the pipeline. -# Under the *parameter* column, look for *input*. Set the value of *input* by clicking on *none* to get a selection popup. The collection that you selected in step 4 will be at the top of that pulldown menu. Select that collection in the pulldown menu. -# You can now click on the *Run pipeline* button in the upper right to start the pipeline. A new page shows the pipeline status, queued to run. -# The page refreshes automatically every 15 seconds. You should see the pipeline running, and then finish successfully. -# Once the pipeline is finished, click on the link under the *output* column. This will take you to the collection page for the output of this pipeline. -# Click on *md5sum.txt* to see the actual file that is the output of this pipeline. -# Go back to the collection page for the result. Click on the *Provenance graph* tab to see a graph illustrating the collections and scripts that were used to generate this file. +# Start from the *Workbench Dashboard*. You can access the Dashboard by clicking on * Dashboard* in the upper left corner of any Workbench page. +# Click on the Run a pipeline... button. This will open a dialog box titled *Choose a pipeline to run*. +# Click to open the *All projects * menu. Under the *Projects shared with me* header, select * Arvados Tutorial*. +# Select * Tutorial align using bwa mem* and click the Next: choose inputs button. This will create a new pipeline in your *Home* project and will open it. You can now supply the inputs for the pipeline. +# The first input parameter to the pipeline is *Reference genoma (fasta)*. Click the Choose button beneath that header. This will open a dialog box titled *Choose a dataset for Reference genome (fasta)*. +# Once again, open the *All projects * menu and select * Arvados Tutorial*. Select * Tutorial chromosome 19 reference* and click the OK button. +# Repeat the previous two steps to set the *Input genome (fastq)* parameter to * Tutorial sample exome*. +# Click on the Run button. The page updates to show you that the pipeline has been submitted to run on the Arvados cluster. +# After the pipeline starts running, you can track the progress by watching log messages from jobs. This page refreshes automatically. You will see a complete label under the *job* column when the pipeline completes successfully. +# Click on the *Output* link to see the results of the job. This will load a new page listing the output files from this pipeline. You'll see the output SAM file from the alignment tool under the *Files* tab. +# Click on the download button to the right of the SAM file to download your results. notextile.
-