Formatting & content WIP
[rnaseq-cwl-training.git] / setup.md
1 ---
2 title: Setup
3 ---
4
5 # Setting up a practice repository
6
7 We will create a new git repository and import a library of existing
8 tool definitions that will help us build our workflow.
9
10 Create a new git repository to hold our workflow with this command:
11
12 ```
13 git init rnaseq-cwl-training-exercises
14 ```
15
16 On Arvados use this:
17
18 ```
19 git clone https://github.com/arvados/arvados-vscode-cwl-template.git rnaseq-cwl-training-exercises
20 ```
21
22 Next, import bio-cwl-tools with this command:
23
24 ```
25 git submodule add https://github.com/common-workflow-library/bio-cwl-tools.git
26 ```
27
28 # Downloading sample and reference data
29
30 Start from your rnaseq-cwl-exercises directory.
31
32 ```
33 mkdir rnaseq
34 cd rnaseq
35 wget --mirror --no-parent --no-host --cut-dirs=1 https://download.pirca.arvadosapi.com/c=9178fe1b80a08a422dbe02adfd439764+925/
36 ```
37
38 # Downloading or generating STAR index
39
40 Running STAR requires index files generated from the reference.
41
42 This is a rather large download (4 GB).  Depending on your bandwidth, it may be faster to generate it yourself.
43
44 ## Downloading
45
46 ```
47 mkdir hg19-chr1-STAR-index
48 cd hg19-chr1-STAR-index
49 wget --mirror --no-parent --no-host --cut-dirs=1 https://download.pirca.arvadosapi.com/c=02a12ce9e2707610991bd29d38796b57+2912/
50 ```
51
52 ## Generating
53
54 Create `chr1-star-index.yaml`:
55
56 ```
57 InputFiles:
58   - class: File
59     location: rnaseq/reference_data/chr1.fa
60     format: http://edamontology.org/format_1930
61 IndexName: 'hg19-chr1-STAR-index'
62 Gtf:
63   class: File
64   location: rnaseq/reference_data/chr1-hg19_genes.gtf
65 Overhang: 99
66 ```
67
68 Generate the index with your local cwl-runner.
69
70 ```
71 cwl-runner bio-cwl-tools/STAR/STAR-Index.cwl chr1-star-index.yaml
72 ```
73
74
75 {% include links.md %}