Merge branch '2072-workbench-docker'
[arvados.git] / doc / user / examples / crunch-examples.html.textile.liquid
1 ---
2 layout: default
3 navsection: userguide
4 navmenu: Examples
5 title: "Crunch examples"
6
7 ...
8
9 h1. Crunch examples
10
11 Several crunch scripts are included with Arvados in the "/crunch_scripts directory":https://arvados.org/projects/arvados/repository/revisions/master/show/crunch_scripts. They are intended to provide examples and starting points for writing your own scripts.
12
13 h4. bwa-aln
14
15 Run the bwa aligner on a set of paired-end fastq files, producing a BAM file for each pair. "View source.":https://arvados.org/projects/arvados/repository/revisions/master/entry/crunch_scripts/bwa-aln
16
17 <div class="offset1">
18 table(table table-bordered table-condensed).
19 |_Parameter_|_Description_|_Example_|
20 |bwa_tbz|Collection with the bwa source distribution.|@8b6e2c4916133e1d859c9e812861ce13+70@|
21 |input|Collection with fastq reads (pairs of *_1.fastq.gz and *_2.fastq.gz).|@d0136bc494c21f79fc1b6a390561e6cb+2778@|
22 </div>
23
24 h4. bwa-index
25
26 Generate an index of a fasta reference genome suitable for use by bwa-aln. "View source.":https://arvados.org/projects/arvados/repository/revisions/master/entry/crunch_scripts/bwa-index
27
28 <div class="offset1">
29 table(table table-bordered table-condensed).
30 |_Parameter_|_Description_|_Example_|
31 |bwa_tbz|Collection with the bwa source distribution.|@8b6e2c4916133e1d859c9e812861ce13+70@|
32 |input|Collection with reference data (*.fasta.gz, *.fasta.fai.gz, *.dict.gz).|@c361dbf46ee3397b0958802b346e9b5a+925@|
33 </div>
34
35 h4. picard-gatk2-prep
36
37 Using the FixMateInformation, SortSam, ReorderSam, AddOrReplaceReadGroups, and BuildBamIndex modules from picard, prepare a BAM file for use with the GATK2 tools. Additionally, run picard's CollectAlignmentSummaryMetrics module to produce a @*.casm.tsv@ statistics file for each BAM file. "View source.":https://arvados.org/projects/arvados/repository/revisions/master/entry/crunch_scripts/picard-gatk2-prep
38
39 <div class="offset1">
40 table(table table-bordered table-condensed).
41 |_Parameter_|_Description_|_Example_|
42 |input|Collection containing aligned bam files.||
43 |picard_zip|Collection with the picard binary distribution.||
44 |reference|Collection with reference data (*.fasta.gz, *.fasta.fai.gz, *.dict.gz).|@c361dbf46ee3397b0958802b346e9b5a+925@|
45 </div>
46
47 h4. GATK2-realign
48
49 Run GATK's RealignerTargetCreator and IndelRealigner modules on a set of BAM files. "View source.":https://arvados.org/projects/arvados/repository/revisions/master/entry/crunch_scripts/GATK2-realign
50
51 <div class="offset1">
52 table(table table-bordered table-condensed).
53 |_Parameter_|_Description_|_Example_|
54 |input|Collection containing aligned bam files.||
55 |picard_zip|Collection with the picard binary distribution.||
56 |gatk_tbz|Collection with the GATK2 binary distribution.||
57 |gatk_bundle|Collection with the GATK data bundle.|@d237a90bae3870b3b033aea1e99de4a9+10820@|
58 |known_sites|List of files in the data bundle to use as GATK @-known@ arguments. Optional. |@["dbsnp_137.b37.vcf","Mills_and_1000G_gold_standard.indels.b37.vcf"]@ (this is the default value)|
59 |regions|Collection with .bed files indicating sequencing target regions. Optional.||
60 |region_padding|Corresponds to GATK @--interval_padding@ argument. Required if a regions parameter is given.|10|
61 </div>
62
63 h4. GATK2-bqsr
64
65 Run GATK's BaseQualityScoreRecalibration module on a set of BAM files. "View source.":https://arvados.org/projects/arvados/repository/revisions/master/entry/crunch_scripts/GATK2-bqsr
66
67 <div class="offset1">
68 table(table table-bordered table-condensed).
69 |_Parameter_|_Description_|_Example_|
70 |input|Collection containing bam files.||
71 |gatk_tbz|Collection with the GATK2 binary distribution.||
72 |gatk_bundle|Collection with the GATK data bundle.|@d237a90bae3870b3b033aea1e99de4a9+10820@|
73 </div>
74
75 h4. GATK2-merge-call
76
77 Merge a set of BAM files using picard, and run GATK's UnifiedGenotyper module on the merged set to produce a VCF file. "View source.":https://arvados.org/projects/arvados/repository/revisions/master/entry/crunch_scripts/GATK2-merge-call
78
79 <div class="offset1">
80 table(table table-bordered table-condensed).
81 |_Parameter_|_Description_|_Example_|
82 |input|Collection containing bam files.||
83 |picard_zip|Collection with the picard binary distribution.||
84 |gatk_tbz|Collection with the GATK2 binary distribution.||
85 |gatk_bundle|Collection with the GATK data bundle.|@d237a90bae3870b3b033aea1e99de4a9+10820@|
86 |regions|Collection with .bed files indicating sequencing target regions. Optional.||
87 |region_padding|Corresponds to GATK @--interval_padding@ argument. Required if a regions parameter is given.|10|
88 </div>
89
90 h4. file-select
91
92 Pass through the named files from input to output collection, and ignore the rest. "View source.":https://arvados.org/projects/arvados/repository/revisions/master/entry/crunch_scripts/file-select
93
94 <div class="offset1">
95 table(table table-bordered table-condensed).
96 |_Parameter_|_Description_|_Example_|
97 |names|List of filenames to include in the output.|@["human_g1k_v37.fasta.gz","human_g1k_v37.fasta.fai.gz"]@|
98 </div>