@@ -9,6+9,9 @@ h1. Tutorial: Construct a new pipeline
Here you will write two new crunch scripts, incorporate them into a new pipeline template, run the new pipeline a couple of times using different parameters, and compare the results. One of the new scripts will use the Arvados API to look up trait→human→data relations and use this information to compile a collection of data to analyze.
Here you will write two new crunch scripts, incorporate them into a new pipeline template, run the new pipeline a couple of times using different parameters, and compare the results. One of the new scripts will use the Arvados API to look up trait→human→data relations and use this information to compile a collection of data to analyze.
+_Like the previous tutorial, this needs more of a basis in some actual
ssh-add -l # (run this in your VM account to confirm forwarding works)
</pre>
ssh-add -l # (run this in your VM account to confirm forwarding works)
</pre>
+_This discussion about ssh should probably go under the "ssh" section_
+
With PuTTY under Windows, run "pageant", add your key to the agent, and turn on agent forwarding in your PuTTY settings.
*Option 2:* Edit code on your workstation and push code to your Arvados repository from there instead of your VM account. Depending on your @.ssh/config@ file, you will use names like @my_vm_name.arvados@ instead of @my_vm_name.{{ site.arvados_api_host }}@ in git and ssh commands.
With PuTTY under Windows, run "pageant", add your key to the agent, and turn on agent forwarding in your PuTTY settings.
*Option 2:* Edit code on your workstation and push code to your Arvados repository from there instead of your VM account. Depending on your @.ssh/config@ file, you will use names like @my_vm_name.arvados@ instead of @my_vm_name.{{ site.arvados_api_host }}@ in git and ssh commands.
Whichever setup you choose, if everything is working correctly, this command should give you a list of repositories you can access:
<pre>
Whichever setup you choose, if everything is working correctly, this command should give you a list of repositories you can access:
<pre>
-ssh git@git.{{ site.arvados_api_host }}
+ssh -T git@git.{{ site.arvados_api_host }}
</pre>
↓
</pre>
↓
@@ -69,6+74,10 @@ the gitolite config gives you the following access:
R W your_repo_name
</pre>
R W your_repo_name
</pre>
+_You need to have a git repository set up already, which is not
+necessarily the case for new users, so this should link to the git
+section about setting up a new repo_
+
h3. Set some variables
Adjust these to match your login account name and the URL of your Arvados repository. The Access→VMs and Access→Repositories pages on Workbench will show the specifics.
h3. Set some variables
Adjust these to match your login account name and the URL of your Arvados repository. The Access→VMs and Access→Repositories pages on Workbench will show the specifics.
The new pipeline template will also appear on the Workbench→Compute→Pipeline templates page.
The new pipeline template will also appear on the Workbench→Compute→Pipeline templates page.
+_Storing the pipeline in arvados as well as in git seems redundant_
+
h3. Invoke the pipeline using "arv pipeline run"
Replace the UUID here with the UUID of your own new pipeline template:
h3. Invoke the pipeline using "arv pipeline run"
Replace the UUID here with the UUID of your own new pipeline template:
@@ -289,6+318,8 @@ The output of the "find_variant" component is shown in your terminal with the la
It is also displayed on the pipeline instance detail page: go to Workbench→Compute→Pipeline instances and click the UUID of your pipeline instance.
It is also displayed on the pipeline instance detail page: go to Workbench→Compute→Pipeline instances and click the UUID of your pipeline instance.
+_There needs to be an easier way to get the output from the workbench_
+
h3. Compute a summary statistic from the output collection
For this step we will use python to read the output manifest and count how many of the inputs produced hits.
h3. Compute a summary statistic from the output collection
For this step we will use python to read the output manifest and count how many of the inputs produced hits.
@@ -312,12+343,14 @@ print "%d had the variant, %d did not." % (hits, misses)
4 had the variant, 3 did not.
</pre>
4 had the variant, 3 did not.
</pre>
-h3. Run the pipeline again using different parameters
+_Explain each step_
+
+_h3. Run the pipeline again using different parameters
We can use the same pipeline template to run the jobs again, this time overriding the "trait_name" parameter with a different value:
<pre>
We can use the same pipeline template to run the jobs again, this time overriding the "trait_name" parameter with a different value: