add GATK VariantFiltration tutorial
authorTom Clegg <tom@clinicalfuture.com>
Mon, 15 Jul 2013 21:01:44 +0000 (17:01 -0400)
committerTom Clegg <tom@clinicalfuture.com>
Mon, 15 Jul 2013 21:15:57 +0000 (17:15 -0400)
doc/user/tutorial-gatk-variantfiltration.textile [new file with mode: 0644]
doc/user/tutorial-trait-search.textile

diff --git a/doc/user/tutorial-gatk-variantfiltration.textile b/doc/user/tutorial-gatk-variantfiltration.textile
new file mode 100644 (file)
index 0000000..3ec7ff7
--- /dev/null
@@ -0,0 +1,106 @@
+---
+layout: default
+navsection: userguide
+title: "Tutorial: GATK VariantFiltration"
+navorder: 22
+---
+
+h1. Tutorial: GATK VariantFiltration
+
+Here you will use the GATK VariantFiltration program to assign pass/fail scores to variants in a VCF file.
+
+h3. Prerequisites
+
+* Log in to a VM "using SSH":ssh-access.html
+* Put an "API token":api-tokens.html in your @ARVADOS_API_TOKEN@ environment variable
+* Put the API host name in your @ARVADOS_API_HOST@ environment variable
+
+If everything is set up correctly, the command @arv -h user current@ will display your account information.
+
+h3. Get the GATK binary distribution.
+
+Download the GATK binary tarball[1] -- e.g., @GenomeAnalysisTK-2.6-4.tar.bz2@ -- and copy it to your Arvados VM.
+
+Store it in Keep.
+
+<pre>
+arv keep put --in-manifest GenomeAnalysisTK-2.6-4.tar.bz2
+</pre>
+
+&darr;
+
+<pre>
+c905c8d8443a9c44274d98b7c6cfaa32+94+K@qr1hi
+</pre>
+
+h3. Get the GATK resource bundle.
+
+This can take a while to download, and should already be available in Arvados. For now let's just list the files and sizes, to make sure we have the correct collection ID.
+
+<pre>
+arv keep ls -s d237a90bae3870b3b033aea1e99de4a9+10820+K@qr1hi
+</pre>
+
+&darr;
+
+<pre>
+  50342 1000G_omni2.5.b37.vcf.gz
+      1 1000G_omni2.5.b37.vcf.gz.md5
+    464 1000G_omni2.5.b37.vcf.idx.gz
+      1 1000G_omni2.5.b37.vcf.idx.gz.md5
+  43981 1000G_phase1.indels.b37.vcf.gz
+...
+</pre>
+
+h3. Submit a job.
+
+The Arvados distribution includes an example crunch script ("crunch_scripts/GATK2-VariantFiltration":https://arvados.org/projects/arvados/repository/revisions/master/entry/crunch_scripts/GATK2-VariantFiltration) that runs the GATK VariantFiltration tool with some default settings.
+
+We will pass it the following parameters:
+
+* input -- a collection containing the source VCF data. Here we will use an exome report from PGP participant hu34D5B9.
+* gatk_binary_tarball -- a collection containing the GATK 2 tarball.
+* gatk_bundle -- a collection containing the GATK resource bundle[2].
+
+<pre>
+src_version=76588bfc57f33ea1b36b82ca7187f465b73b4ca4
+vcf_input=5ee633fe2569d2a42dd81b07490d5d13+82+K@qr1hi
+gatk_binary=c905c8d8443a9c44274d98b7c6cfaa32+94+K@qr1hi
+gatk_bundle=d237a90bae3870b3b033aea1e99de4a9+10820+K@qr1hi
+
+read -rd "\000" the_job <<EOF
+{
+ "script":"GATK2-VariantFiltration",
+ "script_version":"$src_version",
+ "script_parameters":
+ {
+  "input":"$vcf_input",
+  "gatk_binary_tarball":"$gatk_binary",
+  "gatk_bundle":"$gatk_bundle"
+ }
+}
+EOF
+
+arv -h job create --job "$the_job"
+</pre>
+
+Note the job UUID in the API response.
+
+h3. Monitor job progress
+
+There are three ways to monitor job progress:
+
+# Go to Workbench, drop down the Compute menu, and click Jobs. The job you submitted should appear at the top of the list. Hit "Refresh" until it finishes.
+# Run @arv -h job get --uuid JOB_UUID_HERE@ to see the job particulars, notably the "tasks_summary" attribute which indicates how many tasks are done/running/todo.
+# Watch the crunch log messages and stderr from the job tasks:
+
+<pre>
+curl -s -H "Authorization: OAuth2 $ARVADOS_API_TOKEN" \
+  https://{{ site.arvados_api_host }}/arvados/v1/jobs/JOB_UUID_HERE/log_tail_follow
+</pre>
+
+h3. Notes
+
+fn1. Download the GATK tools &rarr; http://www.broadinstitute.org/gatk/download
+
+fn2. Information about the GATK resource bundle &rarr; http://gatkforums.broadinstitute.org/discussion/1213/whats-in-the-resource-bundle-and-how-can-i-get-it
index 5f283cf6fa568dd35bf5eb6ae5406f20bfd3efca..92646a73648e49ff56d4f79820c676fe566fb086 100644 (file)
@@ -2,7 +2,7 @@
 layout: default
 navsection: userguide
 title: "Tutorial: Search PGP data by trait"
-navorder: 22
+navorder: 23
 ---
 
 h1. Tutorial: Search PGP data by trait