Merge branch '8784-dir-listings'
[arvados.git] / doc / user / examples / crunch-examples.html.textile.liquid
1 ---
2 layout: default
3 navsection: userguide
4 title: "Scripts provided by Arvados"
5 ...
6 {% comment %}
7 Copyright (C) The Arvados Authors. All rights reserved.
8
9 SPDX-License-Identifier: CC-BY-SA-3.0
10 {% endcomment %}
11
12 {% include 'pipeline_deprecation_notice' %}
13
14 Several crunch scripts are included with Arvados in the "/crunch_scripts directory":https://dev.arvados.org/projects/arvados/repository/revisions/master/show/crunch_scripts. They are intended to provide examples and starting points for writing your own scripts.
15
16 h4. bwa-aln
17
18 Run the bwa aligner on a set of paired-end fastq files, producing a BAM file for each pair. "View source.":https://dev.arvados.org/projects/arvados/repository/revisions/master/entry/crunch_scripts/bwa-aln
19
20 <div class="offset1">
21 table(table table-bordered table-condensed).
22 |_Parameter_|_Description_|_Example_|
23 |bwa_tbz|Collection with the bwa source distribution.|@8b6e2c4916133e1d859c9e812861ce13+70@|
24 |samtools_tgz|Collection with the samtools source distribution.|@c777e23cf13e5d5906abfdc08d84bfdb+74@|
25 |input|Collection with fastq reads (pairs of *_1.fastq.gz and *_2.fastq.gz).|@d0136bc494c21f79fc1b6a390561e6cb+2778@|
26 </div>
27
28 h4. bwa-index
29
30 Generate an index of a fasta reference genome suitable for use by bwa-aln. "View source.":https://dev.arvados.org/projects/arvados/repository/revisions/master/entry/crunch_scripts/bwa-index
31
32 <div class="offset1">
33 table(table table-bordered table-condensed).
34 |_Parameter_|_Description_|_Example_|
35 |bwa_tbz|Collection with the bwa source distribution.|@8b6e2c4916133e1d859c9e812861ce13+70@|
36 |input|Collection with reference data (*.fasta.gz, *.fasta.fai.gz, *.dict.gz).|@c361dbf46ee3397b0958802b346e9b5a+925@|
37 </div>
38
39 h4. picard-gatk2-prep
40
41 Using the FixMateInformation, SortSam, ReorderSam, AddOrReplaceReadGroups, and BuildBamIndex modules from picard, prepare a BAM file for use with the GATK2 tools. Additionally, run picard's CollectAlignmentSummaryMetrics module to produce a @*.casm.tsv@ statistics file for each BAM file. "View source.":https://dev.arvados.org/projects/arvados/repository/revisions/master/entry/crunch_scripts/picard-gatk2-prep
42
43 <div class="offset1">
44 table(table table-bordered table-condensed).
45 |_Parameter_|_Description_|_Example_|
46 |input|Collection containing aligned bam files.||
47 |picard_zip|Collection with the picard binary distribution.|@687f74675c6a0e925dec619cc2bec25f+77@|
48 |reference|Collection with reference data (*.fasta.gz, *.fasta.fai.gz, *.dict.gz).|@c361dbf46ee3397b0958802b346e9b5a+925@|
49 </div>
50
51 h4. GATK2-realign
52
53 Run GATK's RealignerTargetCreator and IndelRealigner modules on a set of BAM files. "View source.":https://dev.arvados.org/projects/arvados/repository/revisions/master/entry/crunch_scripts/GATK2-realign
54
55 <div class="offset1">
56 table(table table-bordered table-condensed).
57 |_Parameter_|_Description_|_Example_|
58 |input|Collection containing aligned bam files.||
59 |picard_zip|Collection with the picard binary distribution.|@687f74675c6a0e925dec619cc2bec25f+77@|
60 |gatk_tbz|Collection with the GATK2 binary distribution.|@7e0a277d6d2353678a11f56bab3b13f2+87@|
61 |gatk_bundle|Collection with the GATK data bundle.|@d237a90bae3870b3b033aea1e99de4a9+10820@|
62 |known_sites|List of files in the data bundle to use as GATK @-known@ arguments. Optional. |@["dbsnp_137.b37.vcf","Mills_and_1000G_gold_standard.indels.b37.vcf"]@ (this is the default value)|
63 |regions|Collection with .bed files indicating sequencing target regions. Optional.||
64 |region_padding|Corresponds to GATK @--interval_padding@ argument. Required if a regions parameter is given.|10|
65 </div>
66
67 h4. GATK2-bqsr
68
69 Run GATK's BaseQualityScoreRecalibration module on a set of BAM files. "View source.":https://dev.arvados.org/projects/arvados/repository/revisions/master/entry/crunch_scripts/GATK2-bqsr
70
71 <div class="offset1">
72 table(table table-bordered table-condensed).
73 |_Parameter_|_Description_|_Example_|
74 |input|Collection containing bam files.||
75 |gatk_tbz|Collection with the GATK2 binary distribution.|@7e0a277d6d2353678a11f56bab3b13f2+87@|
76 |gatk_bundle|Collection with the GATK data bundle.|@d237a90bae3870b3b033aea1e99de4a9+10820@|
77 </div>
78
79 h4. GATK2-merge-call
80
81 Merge a set of BAM files using picard, and run GATK's UnifiedGenotyper module on the merged set to produce a VCF file. "View source.":https://dev.arvados.org/projects/arvados/repository/revisions/master/entry/crunch_scripts/GATK2-merge-call
82
83 <div class="offset1">
84 table(table table-bordered table-condensed).
85 |_Parameter_|_Description_|_Example_|
86 |input|Collection containing bam files.||
87 |picard_zip|Collection with the picard binary distribution.|@687f74675c6a0e925dec619cc2bec25f+77@|
88 |gatk_tbz|Collection with the GATK2 binary distribution.|@7e0a277d6d2353678a11f56bab3b13f2+87@|
89 |gatk_bundle|Collection with the GATK data bundle.|@d237a90bae3870b3b033aea1e99de4a9+10820@|
90 |regions|Collection with .bed files indicating sequencing target regions. Optional.||
91 |region_padding|Corresponds to GATK @--interval_padding@ argument. Required if a regions parameter is given.|10|
92 </div>
93
94 h4. file-select
95
96 Pass through the named files from input to output collection, and ignore the rest. "View source.":https://dev.arvados.org/projects/arvados/repository/revisions/master/entry/crunch_scripts/file-select
97
98 <div class="offset1">
99 table(table table-bordered table-condensed).
100 |_Parameter_|_Description_|_Example_|
101 |names|List of filenames to include in the output.|@["human_g1k_v37.fasta.gz","human_g1k_v37.fasta.fai.gz"]@|
102 </div>