Merge pull request #6 from ALuesink/patch-1
[rnaseq-cwl-training.git] / setup.md
1 ---
2 title: Setup
3 ---
4
5 <div>
6 {% tabs setup %}
7 {% tab setup generic %}
8
9 # Setting up a practice repository
10
11 We will create a new git repository and import a library of existing
12 tool definitions that will help us build our workflow.
13
14 Create a new empty git repository to hold our workflow with this command:
15
16 ```
17 git init rnaseq-cwl-training-exercises
18 ```
19 {: .language-bash }
20
21 Next, import bio-cwl-tools with this command:
22
23 ```
24 git submodule add https://github.com/common-workflow-library/bio-cwl-tools.git
25 ```
26 {: .language-bash }
27
28 # Downloading sample and reference data
29
30 Start from your rnaseq-cwl-exercises directory.
31
32 ```
33 mkdir rnaseq
34 cd rnaseq
35 wget --mirror --no-parent --no-host --cut-dirs=1 https://download.jutro.arvadosapi.com/c=9178fe1b80a08a422dbe02adfd439764+925/
36 ```
37 {: .language-bash }
38
39 # Downloading or generating STAR index
40
41 Running STAR requires index files generated from the reference.
42
43 This is a rather large download (4 GB).  Depending on your bandwidth, it may be faster to generate it yourself.
44
45 ## Downloading
46
47 ```
48 mkdir hg19-chr1-STAR-index
49 cd hg19-chr1-STAR-index
50 wget --mirror --no-parent --no-host --cut-dirs=1 https://download.jutro.arvadosapi.com/c=02a12ce9e2707610991bd29d38796b57+2912/
51 ```
52 {: .language-bash }
53
54 ## Generating
55
56 Create `chr1-star-index.yaml`:
57
58 ```
59 InputFiles:
60   - class: File
61     location: rnaseq/reference_data/chr1.fa
62     format: http://edamontology.org/format_1930
63 IndexName: 'hg19-chr1-STAR-index'
64 Gtf:
65   class: File
66   location: rnaseq/reference_data/chr1-hg19_genes.gtf
67 Overhang: 99
68 ```
69 {: .language-yaml }
70
71 Generate the index with your local cwl-runner.
72
73 ```
74 cwl-runner bio-cwl-tools/STAR/STAR-Index.cwl chr1-star-index.yaml
75 ```
76 {: .language-bash }
77
78
79 {% endtab %}
80
81 {% tab setup arvados %}
82
83 # Setting up a practice repository
84
85 We will create a new git repository and import a library of existing
86 tool definitions that will help us build our workflow.
87
88 When using the recommended VSCode environment to develop on Arvados, start by forking this repository:
89 ```
90 git clone https://github.com/arvados/arvados-vscode-cwl-template.git rnaseq-cwl-training-exercises
91 ```
92 {: .language-bash }
93
94 Next, import bio-cwl-tools with this command:
95
96 ```
97 git submodule add https://github.com/common-workflow-library/bio-cwl-tools.git
98 ```
99 {: .language-bash }
100
101 # Downloading sample and reference data
102
103 > ## Note
104 >
105 > You may already have access to this collection.
106 >
107 > You can check by going to Workbench and pasting
108 > `9178fe1b80a08a422dbe02adfd439764+925` into the search box.  If you
109 > arrived at a collection page instead of a "not found" error, then
110 > you do not need to perform this download step.
111 {: .callout}
112
113 Use `arv-copy` to copy the collection:
114
115 ```
116 arv-copy --src jutro 9178fe1b80a08a422dbe02adfd439764+925
117 ```
118 {: .language-bash }
119
120 # Downloading or generating STAR index
121
122 Running STAR requires index files generated from the reference.
123
124 This is a rather large download (4 GB).  Depending on your bandwidth, it may be faster to generate it yourself.
125
126 ## Downloading
127
128 > ## Note
129 >
130 > As above, you can check by going to Workbench and pasting
131 > `02a12ce9e2707610991bd29d38796b57+2912` into the search box to see
132 > if you already have access to this collection.
133 {: .callout}
134
135 Use `arv-copy` to copy the collection:
136
137 ```
138 arv-copy --src jutro 02a12ce9e2707610991bd29d38796b57+2912
139 ```
140 {: .language-bash }
141
142 ## Generating
143
144 Create `chr1-star-index.yaml`:
145
146 ```
147 InputFiles:
148   - class: File
149     location: keep:9178fe1b80a08a422dbe02adfd439764+925/reference_data/chr1.fa
150     format: http://edamontology.org/format_1930
151 IndexName: 'hg19-chr1-STAR-index'
152 Gtf:
153   class: File
154   location: keep:9178fe1b80a08a422dbe02adfd439764+925/reference_data/chr1-hg19_genes.gtf
155 Overhang: 99
156 ```
157 {: .language-yaml }
158
159 Generate the index with arvados-cwl-runner.
160
161 ```
162 arvados-cwl-runner bio-cwl-tools/STAR/STAR-Index.cwl chr1-star-index.yaml
163 ```
164 {: .language-bash }
165
166 ## Sneak peak
167 If you want to jump ahead, here are links to some of the CWL concepts you just used
168   - [YAML array] (https://www.commonwl.org/user_guide/yaml/#arrays)
169   - [CWL array inputs] (https://www.commonwl.org/user_guide/09-array-inputs/index.html)
170   
171 {% endtab %}
172 {% endtabs %}
173 </div>
174
175 {% include links.md %}