h3. From the Workbench dashboard
-If you have no SSH keys registered, there should be a notification asking you to provide your SSH public key. In the Workbench top navigation menu, look for a dropdown menu with your email address in upper right corner. It will have an icon such as <span class="badge badge-alert">1</span> (the number indicates there are new notifications). Click on this icon and a dropdown menu should appear with a message asking you to add your public key. Paste your public key into the text area provided and click on the check button to submit the key. You are now ready to "log into an Arvados VM":#login.
+If you have no SSH keys registered, there should be a notification asking you to provide your SSH public key. In the Workbench top navigation menu, look for a dropdown menu with an icon such as <span class="badge badge-alert">1</span> <span class="fa fa-lg fa-user"></span> <span class="caret"></span> (the number indicates there are new notifications). Click on this icon and a dropdown menu should appear with a message asking you to add your public key. Click on the link *Click here to set up an SSH public key for use with Arvados*. This will take you to the *Manage account* page. Click on the <span class="btn btn-primary">*+* Add new SSH key</span> button in this page. This will open a popup as shown in this screenshot:
+
+!{{ site.baseurl }}/images/ssh-adding-public-key.png!
+Paste your public key into the text area labeled *Public Key*, and click on the <span class="btn btn-primary">Submit</span> button. You are now ready to "log into an Arvados VM":#login.
h3. Alternate way to add SSH keys
-Click on the link with your _email address_ in the upper right corner to access the user settings menu, and click on the menu item *Manage account* to go to the account management page.
+Click on the dropdown menu icon <span class="fa fa-lg fa-user"></span> <span class="caret"></span> in the upper right corner of the top navigation menu to access the user settings menu, and click on the menu item *Manage account* to go to the account management page.
On the *Manage account* page, click on the button <span class="btn btn-primary">*+* Add new SSH key</span> button in the upper right corner of the page in the SSH Keys panel.
h1(#login). Using SSH to log into an Arvados VM
-To see a list of virtual machines that you have access to and determine the name and login information, click on the link with your _email address_ in the upper right corner and click on the menu item *Manage account* to go to the account management page. On this page, you will see a *Virtual Machines* panel, which lists the virtual machines you can access. The *hostname* column lists the name of each available VM. The *logins* column will have a list of comma separated values of the form @you@. In this guide the hostname will be *_shell_* and the login will be *_you_*. Replace these with your hostname and login name as appropriate.
+To see a list of virtual machines that you have access to and determine the name and login information, click on the dropdown menu icon <span class="fa fa-lg fa-user"></span> <span class="caret"></span> in the upper right corner of the top navigation menu to access the user settings menu and click on the menu item *Manage account* to go to the account management page. On this page, you will see a *Virtual Machines* panel, which lists the virtual machines you can access. The *hostname* column lists the name of each available VM. The *logins* column will have a list of comma separated values of the form @you@. In this guide the hostname will be *_shell_* and the login will be *_you_*. Replace these with your hostname and login name as appropriate.
--- /dev/null
+{% include 'notebox_begin' %}
+This tutorial assumes that you have installed the Arvados "Command line SDK":{{site.baseurl}}/sdk/cli/install.html and "Python SDK":{{site.baseurl}}/sdk/python/sdk-python.html on your workstation and have a "working environment.":{{site.baseurl}}/user/getting_started/check-environment.html
+{% include 'notebox_end' %}
<p><strong>What is Arvados</strong>
<p><a href="https://arvados.org/">Arvados</a> enables you to quickly begin using cloud computing resources in your data science work. It allows you to track your methods and datasets, share them securely, and easily re-run analyses.
</p>
- <p><strong>News</strong>
+ <p><strong>Communications</strong>
<p>Read our <a href="https://arvados.org/projects/arvados/blogs">blog updates</a> or look through our <a href="https://arvados.org/projects/arvados/activity">recent developer activity</a>.
</p>
- <p><strong>Questions?</strong></p>
- <p>Email <a href="http://lists.arvados.org/mailman/listinfo/arvados">the mailing list</a>, or chat with us on IRC: <a href="irc://irc.oftc.net:6667/#arvados">#arvados</a> @ OFTC (you can <a href="https://webchat.oftc.net/?channels=arvados">join in your browser</a>).
+ <p>Questions? Email <a href="http://lists.arvados.org/mailman/listinfo/arvados">the mailing list</a>, or chat with us on IRC: <a href="irc://irc.oftc.net:6667/#arvados">#arvados</a> @ OFTC (you can <a href="https://webchat.oftc.net/?channels=arvados">join in your browser</a>).
</p>
<p><strong>Want to contribute?</strong></p>
<p>Check out our <a href="https://arvados.org/projects/arvados">developer site</a>. We're open source, check out our code on <a href="https://github.com/curoverse/arvados">github</a>.
h2. On the web
-The Arvados Free Sofware project page is located at "http://arvados.org":http://arvados.org . The "Arvados Wiki":https://arvados.org/projects/arvados/wiki is a collaborative site for documenting Arvados has an overview of the Arvados Platform and Components. The "Arvados blog":https://arvados.org/projects/arvados/blogs posts articles of interest about Arvados.
+The Arvados Free Sofware project page is located at "http://arvados.org":http://arvados.org . The "Arvados Wiki":https://arvados.org/projects/arvados/wiki is a collaborative site for documenting Arvados and provides an overview of the Arvados Platform and Components. The "Arvados blog":https://arvados.org/projects/arvados/blogs posts articles of interest about Arvados.
h2. Mailing lists
h2. IRC
-The "#arvados":irc://irc.oftc.net:6667/#arvados IRC (Internet Relay Chat) channel at on the "Open and Free Technology Community (irc.oftc.net)":http://www.oftc.net/oftc/ is available for live discussion and support. You can use a traditional IRC client or "join OFTC over the web.":https://webchat.oftc.net/?channels=arvados
+The "#arvados":irc://irc.oftc.net:6667/#arvados IRC (Internet Relay Chat) channel at the "Open and Free Technology Community (irc.oftc.net)":http://www.oftc.net/oftc/ is available for live discussion and support. You can use a traditional IRC client or "join OFTC over the web.":https://webchat.oftc.net/?channels=arvados
h2. Bug tracking
Open a shell on the system where you want to use the Arvados client. This may be your local workstation, or an Arvados virtual machine accessed with SSH (instructions for "Unix":{{site.baseurl}}/user/getting_started/ssh-access-unix.html#login or "Windows":{{site.baseurl}}/user/getting_started/ssh-access-windows.html#login).
-Click on the link with your _email address_ in the upper right corner to access your account menu, then click on the menu item *Manage account* to go to the account management page. On the *Manage account* page, you will see the *Current Token* panel, which lists your current token and instructions to set up your environment.
+Click on the dropdown menu icon <span class="fa fa-lg fa-user"></span> <span class="caret"></span> in the upper right corner of the top navigation menu to access your account menu, then click on the menu item *Manage account* to go to the account management page. On the *Manage account* page, you will see the *Current Token* panel, which lists your current token and instructions to set up your environment.
h2. Setting environment variables
# Upload that image to Arvados for use by Crunch jobs
# Share your image with others
-{% include 'tutorial_expectations' %}
+{% include 'tutorial_expectations_workstation' %}
You also need ensure that "Docker is installed,":https://docs.docker.com/installation/ the Docker daemon is running, and you have permission to access Docker. You can test this by running @docker version@. If you receive a permission denied error, your user account may need to be added to the @docker@ group. If you have root access, you can add yourself to the @docker@ group using @$ sudo addgroup $USER docker@ then log out and log back in again; otherwise consult your local sysadmin.
Docker images are subject to normal Arvados permissions. If wish to share your Docker image with others (or wish to share a pipeline template that uses your Docker image) you will need to use @arv keep docker@ with the @--project-uuid@ option to upload the image to a shared project.
<notextile>
-<pre><code>$ <span class="userinput">arv keep docker --project-uuid zzzzz-j7d0g-u7zg1qdaowykd8d arvados/jobs-with-r</span>
+<pre><code>$ <span class="userinput">arv keep docker --project-uuid qr1hi-j7d0g-xxxxxxxxxxxxxxx arvados/jobs-with-r</span>
</code></pre>
</notextile>
@arv-web@ enables you to run a custom web service from the contents of an Arvados collection.
+{% include 'tutorial_expectations_workstation' %}
+
h2. Usage
@arv-web@ enables you to set up a web service based on the most recent collection in a project. An arv-web application is a reproducible, immutable application bundle where the web app is packaged with both the code to run and the data to serve. Because Arvados Collections can be updated with minimum duplication, it is efficient to produce a new application bundle when the code or data needs to be updated; retaining old application bundles makes it easy to go back and run older versions of your web app.
<pre>
+$ cd $HOME/arvados/services/arv-web
usage: arv-web.py [-h] --project-uuid PROJECT_UUID [--port PORT]
[--image IMAGE]
The @run-command@ crunch script enables you run command line programs.
+{% include 'tutorial_expectations_workstation' %}
+
h1. Using run-command
The basic @run-command@ process evaluates its inputs and builds a command line, executes the command, and saves the contents of the output directory back to Keep. For large datasets, @run-command@ can schedule concurrent tasks to execute the wrapped program over a range of inputs (see @task.foreach@ below.)
title: "Concurrent Crunch tasks"
...
-In the previous tutorials, we used @arvados.job_setup.one_task_per_input_file()@ to automatically create concurrent jobs by creating a separate task per file. For some types of jobs, you may need to split the work up differently, for example creating tasks to process different segments of a single large file. In this this tutorial will demonstrate how to create Crunch tasks directly.
+In the previous tutorials, we used @arvados.job_setup.one_task_per_input_file()@ to automatically create concurrent jobs by creating a separate task per file. For some types of jobs, you may need to split the work up differently, for example creating tasks to process different segments of a single large file. This tutorial will demonstrate how to create Crunch tasks directly.
Start by entering the @crunch_scripts@ directory of your Git repository:
2014-08-06_15:16:35 qr1hi-8i9sb-qyrat80ef927lam 14473 release job allocation
2014-08-06_15:16:35 qr1hi-8i9sb-qyrat80ef927lam 14473 Freeze not implemented
2014-08-06_15:16:35 qr1hi-8i9sb-qyrat80ef927lam 14473 collate
-2014-08-06_15:16:36 qr1hi-8i9sb-qyrat80ef927lam 14473 output uuid qr1hi-4zz18-n91qrqfp3zivexo
+2014-08-06_15:16:36 qr1hi-8i9sb-qyrat80ef927lam 14473 collated output manifest text to send to API server is 105 bytes with access tokens
2014-08-06_15:16:36 qr1hi-8i9sb-qyrat80ef927lam 14473 output hash c1b44b6dc41ef334cf1136033ca950e6+54
2014-08-06_15:16:37 qr1hi-8i9sb-qyrat80ef927lam 14473 finish
2014-08-06_15:16:38 qr1hi-8i9sb-qyrat80ef927lam 14473 log manifest is 7fe8cf1d45d438a3ca3ac4a184b7aff4+83
</code></pre>
</notextile>
-Although the job runs locally, the output of the job has been saved to Keep, the Arvados file store. The "output uuid" line (fourth from the bottom) provides the UUID of the Arvados collection where the script's output has been saved. Copy the output identifier and use @arv-ls@ to list the contents of your output collection, and @arv-get@ to download it to the current directory:
+Although the job runs locally, the output of the job has been saved to Keep, the Arvados file store. The "output hash" line (third from the bottom) provides the portable data hash of the Arvados collection where the script's output has been saved. Copy the output hash and use @arv-ls@ to list the contents of your output collection, and @arv-get@ to download it to the current directory:
<notextile>
-<pre><code>~/tutorial/crunch_scripts$ <span class="userinput">arv-ls qr1hi-4zz18-n91qrqfp3zivexo</span>
+<pre><code>~/tutorial/crunch_scripts$ <span class="userinput">arv-ls c1b44b6dc41ef334cf1136033ca950e6+54</span>
./md5sum.txt
-~/tutorial/crunch_scripts$ <span class="userinput">arv-get qr1hi-4zz18-n91qrqfp3zivexo/ .</span>
+~/tutorial/crunch_scripts$ <span class="userinput">arv-get c1b44b6dc41ef334cf1136033ca950e6+54/ .</span>
0 MiB / 0 MiB 100.0%
~/tutorial/crunch_scripts$ <span class="userinput">cat md5sum.txt</span>
44b8ae3fde7a8a88d2f7ebd237625b4f c1bad4b39ca5a924e481008009d94e32+210/var-GS000016015-ASM.tsv.bz2
!{display: block;margin-left: 25px;margin-right: auto;border:1px solid lightgray;}{{ site.baseurl }}/images/shared-collection.png!
-A user with this url can download this collection by simply accessing this url. It will present a downloadable version of the collection as shown below.
+A user with this url can download this collection by simply accessing this url using browser. It will present a downloadable version of the collection as shown below.
!{display: block;margin-left: 25px;margin-right: auto;border:1px solid lightgray;}{{ site.baseurl }}/images/download-shared-collection.png!
h3. Locate your collection in Workbench
-Visit the Workbench *Dashboard*. Click on *Projects*<span class="caret"></span> dropdown menu in the top navigation menu, select your *Home* project. Your newly uploaded collection should appear near the top of the *Data collections* tab. The collection locator printed by @arv keep put@ will appear under the *name* column.
+Visit the Workbench *Dashboard*. Click on *Projects*<span class="caret"></span> dropdown menu in the top navigation menu, select your *Home* project. Your newly uploaded collection should appear near the top of the *Data collections* tab. The collection name printed by @arv keep put@ will appear under the *name* column.
-To move the collection to a different project, check the box at the left of the collection row. Pull down the *Selection...*<span class="caret"></span> menu near the top of the page tab, and select *Move selected*. This will open a dialog box where you can select a destination project for the collection. Click a project, then finally the <span class="btn btn-sm btn-primary">Move</span> button.
+To move the collection to a different project, check the box at the left of the collection row. Pull down the *Selection...*<span class="caret"></span> menu near the top of the page tab, and select *Move selected...* button. This will open a dialog box where you can select a destination project for the collection. Click a project, then finally the <span class="btn btn-sm btn-primary">Move</span> button.
!{display: block;margin-left: 25px;margin-right: auto;}{{ site.baseurl }}/images/workbench-move-selected.png!
*Note:* If you leave the collection page during the upload, the upload process will be aborted and you will need to upload the files again.
-*Note:* You can also use the Upload tab to add files to an existing collection.
+*Note:* You can also use the Upload tab to add additional files to an existing collection.
A "pipeline" (sometimes called a "workflow" in other systems) is a sequence of steps that apply various programs or tools to transform input data to output data. Pipelines are the principal means of performing computation with Arvados. This tutorial demonstrates how to run a single-stage pipeline to take a small data set of paired-end reads from a sample "exome":https://en.wikipedia.org/wiki/Exome in "FASTQ":https://en.wikipedia.org/wiki/FASTQ_format format and align them to "Chromosome 19":https://en.wikipedia.org/wiki/Chromosome_19_%28human%29 using the "bwa mem":http://bio-bwa.sourceforge.net/ tool, producing a "Sequence Alignment/Map (SAM)":https://samtools.github.io/ file. This tutorial will introduce the following Arvados features:
-<div class="inside-list">
+<div>
* How to create a new pipeline from an existing template.
* How to browse and select input data for the pipeline and submit the pipeline to run on the Arvados cluster.
* How to access your pipeline results.
notextile. <div class="spaced-out">
+h3. Steps
+
# Start from the *Workbench Dashboard*. You can access the Dashboard by clicking on *<i class="fa fa-lg fa-fw fa-dashboard"></i> Dashboard* in the upper left corner of any Workbench page.
# Click on the <span class="btn btn-sm btn-primary"><i class="fa fa-fw fa-gear"></i> Run a pipeline...</span> button. This will open a dialog box titled *Choose a pipeline to run*.
# Click to open the *All projects <span class="caret"></span>* menu. Under the *Projects shared with me* header, select *<i class="fa fa-fw fa-share-alt"></i> Arvados Tutorial*.
# Select *<i class="fa fa-fw fa-gear"></i> Tutorial align using bwa mem* and click the <span class="btn btn-sm btn-primary" >Next: choose inputs <i class="fa fa-fw fa-arrow-circle-right"></i></span> button. This will create a new pipeline in your *Home* project and will open it. You can now supply the inputs for the pipeline.
-# The first input parameter to the pipeline is *Reference genoma (fasta)*. Click the <span class="btn btn-sm btn-primary">Choose</span> button beneath that header. This will open a dialog box titled *Choose a dataset for Reference genome (fasta)*.
+# The first input parameter to the pipeline is *"reference_collection" parameter for run-command script in bwa-mem component*. Click the <span class="btn btn-sm btn-primary">Choose</span> button beneath that header. This will open a dialog box titled *Choose a dataset for "reference_collection" parameter for run-command script in bwa-mem component*.
# Once again, open the *All projects <span class="caret"></span>* menu and select *<i class="fa fa-fw fa-share-alt"></i> Arvados Tutorial*. Select *<i class="fa fa-fw fa-archive"></i> Tutorial chromosome 19 reference* and click the <span class="btn btn-sm btn-primary" >OK</span> button.
-# Repeat the previous two steps to set the *Input genome (fastq)* parameter to *<i class="fa fa-fw fa-archive"></i> Tutorial sample exome*.
+# Repeat the previous two steps to set the *"sample" parameter for run-command script in bwa-mem component* parameter to *<i class="fa fa-fw fa-archive"></i> Tutorial sample exome*.
# Click on the <span class="btn btn-sm btn-primary" >Run <i class="fa fa-fw fa-play"></i></span> button. The page updates to show you that the pipeline has been submitted to run on the Arvados cluster.
# After the pipeline starts running, you can track the progress by watching log messages from jobs. This page refreshes automatically. You will see a <span class="label label-success">complete</span> label under the *job* column when the pipeline completes successfully.
# Click on the *Output* link to see the results of the job. This will load a new page listing the output files from this pipeline. You'll see the output SAM file from the alignment tool under the *Files* tab.
~$ <span class="userinput">git config --global user.email $USER@example.com</span></code></pre>
</notextile>
-On the Arvados Workbench, navigate to "Code repositories":{{site.arvados_workbench_host}}/repositories. You should see a repository with your user name listed in the *name* column. Next to *name* is the column *push_url*. Copy the *push_url* value associated with your repository. This should look like <notextile><code>git@git.{{ site.arvados_api_host }}:$USER/$USER.git</code></notextile>.
+On the Arvados Workbench, click on the dropdown menu icon <span class="fa fa-lg fa-user"></span> <span class="caret"></span> in the upper right corner of the top navigation menu to access the user settings menu, and click on the menu item *Manage account* to go to the account management page.
+
+On the *Manage account* page, you will see *Repositories* panel. In this panel, you should see a repository with your user name listed in the *name* column. Next to *name* is the column *URL*. Copy the *URL* value associated with your repository. This should look like <notextile><code>git@git.{{ site.arvados_api_host }}:$USER/$USER.git</code></notextile>.
Next, on the Arvados virtual machine, clone your Git repository:
<notextile> {% code 'tutorial_submit_job' as javascript %} </notextile>
-* @"repository"@ is the name of a git repository to search for the script version. You can access a list of available git repositories on the Arvados Workbench under "Code repositories":{{site.arvados_workbench_host}}/repositories.
+* @"repository"@ is the name of a git repository to search for the script version. You can access a list of available git repositories on the Arvados Workbench in the *Manage account* page using the <span class="fa fa-lg fa-user"></span> <span class="caret"></span> top navigation menu icon.
* @"script_version"@ specifies the version of the script that you wish to run. This can be in the form of an explicit Git revision hash, a tag, or a branch (in which case it will use the HEAD of the specified branch). Arvados logs the script version that was used in the run, enabling you to go back and re-run any past job with the guarantee that the exact same code will be used as was used in the previous run.
* @"script"@ specifies the filename of the script to run. Crunch expects to find this in the @crunch_scripts/@ subdirectory of the Git repository.
* @"runtime_constraints"@ describes the runtime environment required to run the job. These are described in the "job record schema":{{site.baseurl}}/api/schema/Job.html