--- /dev/null
+---
+layout: default
+navsection: userguide
+title: "Tutorial: GATK VariantFiltration"
+navorder: 22
+---
+
+h1. Tutorial: GATK VariantFiltration
+
+Here you will use the GATK VariantFiltration program to assign pass/fail scores to variants in a VCF file.
+
+h3. Prerequisites
+
+* Log in to a VM "using SSH":ssh-access.html
+* Put an "API token":api-tokens.html in your @ARVADOS_API_TOKEN@ environment variable
+* Put the API host name in your @ARVADOS_API_HOST@ environment variable
+
+If everything is set up correctly, the command @arv -h user current@ will display your account information.
+
+h3. Get the GATK binary distribution.
+
+Download the GATK binary tarball[1] -- e.g., @GenomeAnalysisTK-2.6-4.tar.bz2@ -- and copy it to your Arvados VM.
+
+Store it in Keep.
+
+<pre>
+arv keep put --in-manifest GenomeAnalysisTK-2.6-4.tar.bz2
+</pre>
+
+↓
+
+<pre>
+c905c8d8443a9c44274d98b7c6cfaa32+94+K@qr1hi
+</pre>
+
+h3. Get the GATK resource bundle.
+
+This can take a while to download, and should already be available in Arvados. For now let's just list the files and sizes, to make sure we have the correct collection ID.
+
+<pre>
+arv keep ls -s d237a90bae3870b3b033aea1e99de4a9+10820+K@qr1hi
+</pre>
+
+↓
+
+<pre>
+ 50342 1000G_omni2.5.b37.vcf.gz
+ 1 1000G_omni2.5.b37.vcf.gz.md5
+ 464 1000G_omni2.5.b37.vcf.idx.gz
+ 1 1000G_omni2.5.b37.vcf.idx.gz.md5
+ 43981 1000G_phase1.indels.b37.vcf.gz
+...
+</pre>
+
+h3. Submit a job.
+
+The Arvados distribution includes an example crunch script ("crunch_scripts/GATK2-VariantFiltration":https://arvados.org/projects/arvados/repository/revisions/master/entry/crunch_scripts/GATK2-VariantFiltration) that runs the GATK VariantFiltration tool with some default settings.
+
+We will pass it the following parameters:
+
+* input -- a collection containing the source VCF data. Here we will use an exome report from PGP participant hu34D5B9.
+* gatk_binary_tarball -- a collection containing the GATK 2 tarball.
+* gatk_bundle -- a collection containing the GATK resource bundle[2].
+
+<pre>
+src_version=76588bfc57f33ea1b36b82ca7187f465b73b4ca4
+vcf_input=5ee633fe2569d2a42dd81b07490d5d13+82+K@qr1hi
+gatk_binary=c905c8d8443a9c44274d98b7c6cfaa32+94+K@qr1hi
+gatk_bundle=d237a90bae3870b3b033aea1e99de4a9+10820+K@qr1hi
+
+read -rd "\000" the_job <<EOF
+{
+ "script":"GATK2-VariantFiltration",
+ "script_version":"$src_version",
+ "script_parameters":
+ {
+ "input":"$vcf_input",
+ "gatk_binary_tarball":"$gatk_binary",
+ "gatk_bundle":"$gatk_bundle"
+ }
+}
+EOF
+
+arv -h job create --job "$the_job"
+</pre>
+
+Note the job UUID in the API response.
+
+h3. Monitor job progress
+
+There are three ways to monitor job progress:
+
+# Go to Workbench, drop down the Compute menu, and click Jobs. The job you submitted should appear at the top of the list. Hit "Refresh" until it finishes.
+# Run @arv -h job get --uuid JOB_UUID_HERE@ to see the job particulars, notably the "tasks_summary" attribute which indicates how many tasks are done/running/todo.
+# Watch the crunch log messages and stderr from the job tasks:
+
+<pre>
+curl -s -H "Authorization: OAuth2 $ARVADOS_API_TOKEN" \
+ https://{{ site.arvados_api_host }}/arvados/v1/jobs/JOB_UUID_HERE/log_tail_follow
+</pre>
+
+h3. Notes
+
+fn1. Download the GATK tools → http://www.broadinstitute.org/gatk/download
+
+fn2. Information about the GATK resource bundle → http://gatkforums.broadinstitute.org/discussion/1213/whats-in-the-resource-bundle-and-how-can-i-get-it