setup.md

   1 ---
   2 title: Setup
   3 ---
   4
   5 {% capture generic_tab_content %}
   6
   7 # Setting up a practice repository
   8
   9 We will create a new git repository and import a library of existing
  10 tool definitions that will help us build our workflow.
  11
  12 Create a new empty git repository to hold our workflow with this command:
  13
  14 ```
  15 git init rnaseq-cwl-training-exercises
  16 ```
  17 {: .language-bash }
  18
  19 Next, import bio-cwl-tools with this command:
  20
  21 ```
  22 git submodule add https://github.com/common-workflow-library/bio-cwl-tools.git
  23 ```
  24 {: .language-bash }
  25
  26 # Downloading sample and reference data
  27
  28 Start from your rnaseq-cwl-exercises directory.
  29
  30 ```
  31 mkdir rnaseq
  32 cd rnaseq
  33 wget --mirror --no-parent --no-host --cut-dirs=1 https://download.jutro.arvadosapi.com/c=9178fe1b80a08a422dbe02adfd439764+925/
  34 ```
  35 {: .language-bash }
  36
  37 # Downloading or generating STAR index
  38
  39 Running STAR requires index files generated from the reference.
  40
  41 This is a rather large download (4 GB).  Depending on your bandwidth, it may be faster to generate it yourself.
  42
  43 ## Downloading
  44
  45 ```
  46 mkdir hg19-chr1-STAR-index
  47 cd hg19-chr1-STAR-index
  48 wget --mirror --no-parent --no-host --cut-dirs=1 https://download.jutro.arvadosapi.com/c=02a12ce9e2707610991bd29d38796b57+2912/
  49 ```
  50 {: .language-bash }
  51
  52 ## Generating
  53
  54 Create `chr1-star-index.yaml`:
  55
  56 ```
  57 InputFiles:
  58   - class: File
  59     location: rnaseq/reference_data/chr1.fa
  60     format: http://edamontology.org/format_1930
  61 IndexName: 'hg19-chr1-STAR-index'
  62 Gtf:
  63   class: File
  64   location: rnaseq/reference_data/chr1-hg19_genes.gtf
  65 Overhang: 99
  66 ```
  67 {: .language-yaml }
  68
  69 Generate the index with your local cwl-runner.
  70
  71 ```
  72 cwl-runner bio-cwl-tools/STAR/STAR-Index.cwl chr1-star-index.yaml
  73 ```
  74 {: .language-bash }
  75
  76 {% endcapture %}
  77
  78 {% capture arvados_tab_content %}
  79
  80 # Setting up a practice repository
  81
  82 We will create a new git repository and import a library of existing
  83 tool definitions that will help us build our workflow.
  84
  85 When using the recommended [VSCode environment to develop on Arvados](https://doc.arvados.org/v2.3/user/cwl/arvados-vscode-training.html),
  86 start by forking the
  87 [arvados-vscode-cwl-template](https://github.com/arvados/arvados-vscode-cwl-template)
  88 repository.
  89
  90 1. Vscode: On the left sidebar, choose `Explorer` ![](../assets/img/Explorer.png)
  91 1. Select `Clone Repository` and enter [https://github.com/arvados/arvados-vscode-cwl-template](https://github.com/arvados/arvados-vscode-cwl-template), then click `Open`
  92 1. If asked `Would you like to open the cloned repository?` choose `Open`
  93
  94 Next, import the [bio-cwl-tools](https://github.com/common-workflow-library/bio-cwl-tools) repository:
  95
  96 1. Vscode: In the top menu, select `Terminal` &rarr; `New Terminal`
  97 1. This will open a terminal window in the lower part of the screen
  98 1. Run this command:
  99 ```
 100 git submodule add https://github.com/common-workflow-library/bio-cwl-tools.git
 101 ```
 102 {: .language-bash }
 103
 104 # Downloading sample and reference data
 105
 106 > ## Note
 107 >
 108 > You may already have access to this collection.
 109 >
 110 > You can check by going to Workbench and pasting
 111 > `9178fe1b80a08a422dbe02adfd439764+925` into the search box.  If you
 112 > arrived at a collection page instead of a "not found" error, then
 113 > you do not need to perform this download step.
 114 {: .callout}
 115
 116 1. Go to https://workbench2.jutro.arvadosapi.com and sign in, this will create an account
 117 2. Go to `Get an API token` under the user menu
 118 3. Log into the shell node of your Arvados cluster
 119 4. On the shell node, copy the host name and token for the 'jutro' cluster into the file `~/.config/arvados/jutro.conf` as described on the page for [arv-copy](https://doc.arvados.org/user/topics/arv-copy.html).
 120
 121 Now, on shell node of your Arvados cluster, use `arv-copy` to copy the collection:
 122
 123 ```
 124 arv-copy --src jutro 9178fe1b80a08a422dbe02adfd439764+925
 125 ```
 126 {: .language-bash }
 127
 128 # Downloading or generating STAR index
 129
 130 Running STAR requires index files generated from the reference.
 131
 132 This is a rather large download (4 GB).  Depending on your bandwidth, it may be faster to generate it yourself.
 133
 134 ## Downloading
 135
 136 > ## Note
 137 >
 138 > As above, you can check by going to Workbench and pasting
 139 > `02a12ce9e2707610991bd29d38796b57+2912` into the search box to see
 140 > if you already have access to this collection.
 141 {: .callout}
 142
 143 Use `arv-copy` to copy the collection:
 144
 145 ```
 146 arv-copy --src jutro 02a12ce9e2707610991bd29d38796b57+2912
 147 ```
 148 {: .language-bash }
 149
 150 ## Generating
 151
 152 Create `chr1-star-index.yaml`:
 153
 154 ```
 155 InputFiles:
 156   - class: File
 157     location: keep:9178fe1b80a08a422dbe02adfd439764+925/reference_data/chr1.fa
 158     format: http://edamontology.org/format_1930
 159 IndexName: 'hg19-chr1-STAR-index'
 160 Gtf:
 161   class: File
 162   location: keep:9178fe1b80a08a422dbe02adfd439764+925/reference_data/chr1-hg19_genes.gtf
 163 Overhang: 99
 164 ```
 165 {: .language-yaml }
 166
 167 Generate the index with arvados-cwl-runner.
 168
 169 ```
 170 arvados-cwl-runner bio-cwl-tools/STAR/STAR-Index.cwl chr1-star-index.yaml
 171 ```
 172 {: .language-bash }
 173
 174 {% endcapture %}
 175
 176 <div class="tabbed">
 177   <ul class="tab">
 178       <li><a href="#section-generic">generic</a></li>
 179       <li><a href="#section-arvados">arvados</a></li>
 180   </ul>
 181
 182   <section id="section-generic">{{ generic_tab_content | markdownify}}</section>
 183   <section id="section-arvados">{{ arvados_tab_content | markdownify}}</section>
 184 </div>
 185
 186 {% include links.md %}