Formatting & content WIP

author Peter Amstutz <peter.amstutz@curii.com>

Tue, 26 Jan 2021 22:41:21 +0000 (17:41 -0500)

committer Peter Amstutz <peter.amstutz@curii.com>

Tue, 26 Jan 2021 22:41:21 +0000 (17:41 -0500)
author Peter Amstutz <peter.amstutz@curii.com>
Tue, 26 Jan 2021 22:41:21 +0000 (17:41 -0500)
committer Peter Amstutz <peter.amstutz@curii.com>
Tue, 26 Jan 2021 22:41:21 +0000 (17:41 -0500)
diff --git a/_config.yml b/_config.yml

index a67f14b1d1f4da2ccebd8df341ddb6becf2a317e..7ddda3bd1718872e8664a6ed354877e77593c50e 100644 (file)
--- a/_config.yml
+++ b/_config.yml
@@ -11,7 +11,7 @@
  carpentry: "swc"
  
  # Overall title for pages.
  carpentry: "swc"
  
  # Overall title for pages.
-title: "Lesson Title"
+title: "Getting started with CWL"
  
  # Life cycle stage of the lesson
  # See this page for more details: https://cdh.carpentries.org/the-lesson-life-cycle.html
  
  # Life cycle stage of the lesson
  # See this page for more details: https://cdh.carpentries.org/the-lesson-life-cycle.html
diff --git a/_episodes/01-introduction.md b/_episodes/01-introduction.md

index fa10f796d02d55a2be56d242e8249428a1f28d99..8ee870e5f36396d728382eb06ff5c4a05386824a 100644 (file)
--- a/_episodes/01-introduction.md
+++ b/_episodes/01-introduction.md
@@ -1,40 +1,56 @@
  ---
  title: "Introduction"
  ---
  title: "Introduction"
-teaching: 0
+teaching: 10
  exercises: 0
  questions:
  exercises: 0
  questions:
-- "Key question (FIXME)"
+- "What is CWL?"
+- "What is the goal of this training?"
  objectives:
  - "First learning objective. (FIXME)"
  keypoints:
  - "First key point. Brief Answer to questions. (FIXME)"
  ---
  
  objectives:
  - "First learning objective. (FIXME)"
  keypoints:
  - "First key point. Brief Answer to questions. (FIXME)"
  ---
  
-## Introduction
+# Introduction to Common Worklow Language
  
  
-The goal of this training is to walk through the development of a
-best-practices CWL workflow by translating an existing bioinformatics
-shell script into CWL.  Specific knowledge of the biology of RNA-seq
-is *not* a prerequisite for these lessons.
+The Common Workflow Language (CWL) is an open standard for describing
+analysis workflows and tools in a way that makes them portable and
+scalable across a variety of software and hardware environments, from
+workstations to cluster, cloud, and high performance computing (HPC)
+environments. CWL is designed to meet the needs of data-intensive
+science, such as Bioinformatics, Medical Imaging, Astronomy, High
+Energy Physics, and Machine Learning.
  
  
-These lessons are based on "Introduction to RNA-seq using
-high-performance computing (HPC)" lessons developed by members of the
-teaching team at the Harvard Chan Bioinformatics Core (HBC).  The
-original training, which includes additional lectures about the
-biology of RNA-seq can be found here:
+# Introduction to this training
  
  
-https://github.com/hbctraining/Intro-to-rnaseq-hpc-O2
+The goal of this training is to walk the student through the
+development of a best-practices CWL workflow, starting from an
+existing shell script that performs a common bioinformatics analysis.
  
  
-## Background
+Specific knowledge of the biology of RNA-seq is *not* a prerequisite
+for these lessons.  CWL is not domain specific to bioinformatics.  We
+hope that you will find this training useful even if you work in some
+other field of research.
  
  
-RNA-seq is the process of sequencing RNA in a biological sample.  From
-the sequence reads, we want to measure the relative number of RNA
-molecules appearing in the sample that were produced by particular
-genes.  This analysis is called "differential gene expression".
+These lessons are based on [Introduction to RNA-seq using
+high-performance computing
+(HPC)](https://github.com/hbctraining/Intro-to-rnaseq-hpc-O2) lessons
+developed by members of the teaching team at the Harvard Chan
+Bioinformatics Core (HBC).  The original training, which includes
+additional lectures about the biology of RNA-seq, can be found at that
+link.
+
+# Introduction to the example analysis
+
+RNA-seq is the process of sequencing RNA present in a biological
+sample.  From the sequence reads, we want to measure the relative
+numbers of different RNA molecules appearing in the sample that were
+produced by particular genes.  This analysis is called "differential
+gene expression".
  
  The entire process looks like this:
  
  
  The entire process looks like this:
  
-![](/assets/img/RNAseqWorkflow.png)
+![](/assets/img/RNAseqWorkflow.png){: height="400px"}
  
  For this training, we are only concerned with the middle analytical
  steps (skipping adapter trimming).
  
  For this training, we are only concerned with the middle analytical
  steps (skipping adapter trimming).
@@ -43,13 +59,10 @@ steps (skipping adapter trimming).
  * Alignment (mapping)
  * Counting reads associated with genes
  
  * Alignment (mapping)
  * Counting reads associated with genes
  
-## Analysis shell script
-
-This analysis is already available as a Unix shell script, which we
-will refer to in order to build the workflow.
-
-Some of the reasons to use CWL over a plain shell script: portability,
-scalability, ability to run on platforms that are not traditional HPC.
+In this training, we are not attempting to develop the analysis from
+scratch, instead we we will be starting from an analysis written as a
+shell script.  We will be using the following shell script as a guide to build
+our workflow.
  
  rnaseq_analysis_on_input_file.sh
  
  
  rnaseq_analysis_on_input_file.sh
  
diff --git a/_episodes/02-workflow.md b/_episodes/02-workflow.md

index a3700a9def6b1cf2727dbdcd609e959c78ca9d6c..cfb133cb47fb3bed9f18181d1f6fce557e4b6dee 100644 (file)
--- a/_episodes/02-workflow.md
+++ b/_episodes/02-workflow.md
@@ -1,5 +1,5 @@
  ---
  ---
-title: "Turning a shell script into a workflow by composing existing tools"
+title: "Make a workflow by composing tools"
  teaching: 0
  exercises: 0
  questions:
  teaching: 0
  exercises: 0
  questions:
@@ -10,32 +10,7 @@ keypoints:
  - "First key point. Brief Answer to questions. (FIXME)"
  ---
  
  - "First key point. Brief Answer to questions. (FIXME)"
  ---
  
-# Setting up
-
-We will create a new git repository and import a library of existing
-tool definitions that will help us build our workflow.
-
-Create a new git repository to hold our workflow with this command:
-
-```
-git init rnaseq-cwl-training-exercises
-```
-
-On Arvados use this:
-
-```
-git clone https://github.com/arvados/arvados-vscode-cwl-template.git rnaseq-cwl-training-exercises
-```
-
-Next, import bio-cwl-tools with this command:
-
-```
-git submodule add https://github.com/common-workflow-library/bio-cwl-tools.git
-```
-
-# Writing the workflow
-
-## 1. File header
+# 1. File header
  
  Create a new file "main.cwl"
  
  
  Create a new file "main.cwl"
  
@@ -48,7 +23,7 @@ class: Workflow
  label: RNAseq CWL practice workflow
  ```
  
  label: RNAseq CWL practice workflow
  ```
  
-## 2. Workflow Inputs
+# 2. Workflow Inputs
  
  The purpose of a workflow is to consume some input parameters, run a
  series of steps, and produce output values.
  
  The purpose of a workflow is to consume some input parameters, run a
  series of steps, and produce output values.
@@ -81,7 +56,7 @@ inputs:
    gtf: File
  ```
  
    gtf: File
  ```
  
-## 3. Workflow Steps
+# 3. Workflow Steps
  
  A workflow consists of one or more steps.  This is the `steps` section.
  
  
  A workflow consists of one or more steps.  This is the `steps` section.
  
@@ -116,7 +91,7 @@ steps:
      out: [html_file]
  ```
  
      out: [html_file]
  ```
  
-## 4. Running alignment with STAR
+# 4. Running alignment with STAR
  
  STAR has more parameters.  Sometimes we want to provide input values
  to a step without making them as workflow-level inputs.  We can do
  
  STAR has more parameters.  Sometimes we want to provide input values
  to a step without making them as workflow-level inputs.  We can do
@@ -138,7 +113,7 @@ this with `{default: N}`
      out: [alignment]
  ```
  
      out: [alignment]
  ```
  
-## 5. Running samtools
+# 5. Running samtools
  
  The third step is to generate an index for the aligned BAM.
  
  
  The third step is to generate an index for the aligned BAM.
  
@@ -157,7 +132,7 @@ step will not run until the `STAR` step has completed successfully.
      out: [bam_sorted_indexed]
  ```
  
      out: [bam_sorted_indexed]
  ```
  
-## 6. featureCounts
+# 6. featureCounts
  
  As of this writing, the `subread` package that provides
  `featureCounts` is not available in bio-cwl-tools (and if it has been
  
  As of this writing, the `subread` package that provides
  `featureCounts` is not available in bio-cwl-tools (and if it has been
@@ -165,7 +140,7 @@ added since writing this, let's pretend that it isn't there.)  We will
  go over how to write a CWL wrapper for a command line tool in
  lesson 3.  For now, we will leave off the final step.
  
  go over how to write a CWL wrapper for a command line tool in
  lesson 3.  For now, we will leave off the final step.
  
-## 7. Workflow Outputs
+# 7. Workflow Outputs
  
  The last thing to do is declare the workflow outputs in the `outputs` section.
  
  
  The last thing to do is declare the workflow outputs in the `outputs` section.
  
diff --git a/_episodes/03-running.md b/_episodes/03-running.md

index b851a2959f70af583ad60987b7fb36069941cdcc..c6ff7d53642b0ddb1a8dcb89b5e97d7f78d63d43 100644 (file)
--- a/_episodes/03-running.md
+++ b/_episodes/03-running.md
@@ -10,9 +10,7 @@ keypoints:
  - "First key point. Brief Answer to questions. (FIXME)"
  ---
  
  - "First key point. Brief Answer to questions. (FIXME)"
  ---
  
-# Running and debugging a workflow
-
-### 1. The input parameter file
+# 1. The input parameter file
  
  CWL input values are provided in the form of a YAML or JSON file.
  Create one by right clicking on the explorer, select "New File" and
  
  CWL input values are provided in the form of a YAML or JSON file.
  Create one by right clicking on the explorer, select "New File" and
@@ -26,7 +24,7 @@ When setting inputs, Files and Directories are given as an object with
  `class: File` or `class: Directory`.  This distinguishes them from
  plain strings that may or may not be file paths.
  
  `class: File` or `class: Directory`.  This distinguishes them from
  plain strings that may or may not be file paths.
  
-Note: if you don't have example sequence data or the STAR index files, see the Appendix below.
+Note: if you don't have example sequence data or the STAR index files, see [setup](/setup.html).
  
  ```
  fq:
  
  ```
  fq:
@@ -56,7 +54,7 @@ gtf:
    location: keep:9178fe1b80a08a422dbe02adfd439764+925/reference_data/chr1-hg19_genes.gtf
  ```
  
    location: keep:9178fe1b80a08a422dbe02adfd439764+925/reference_data/chr1-hg19_genes.gtf
  ```
  
-### 2. Running the workflow
+# 2. Running the workflow
  
  Type this into the terminal:
  
  
  Type this into the terminal:
  
@@ -64,7 +62,7 @@ Type this into the terminal:
  cwl-runner main.cwl main-input.yaml
  ```
  
  cwl-runner main.cwl main-input.yaml
  ```
  
-### 3. Debugging the workflow
+# 3. Debugging the workflow
  
  A workflow can fail for many reasons: some possible reasons include
  bad input, bugs in the code, or running out memory.  In this case, the
  
  A workflow can fail for many reasons: some possible reasons include
  bad input, bugs in the code, or running out memory.  In this case, the
@@ -92,7 +90,7 @@ Container exited with code: 137
  
  If this happens, you will need to request more RAM.
  
  
  If this happens, you will need to request more RAM.
  
-### 4. Setting runtime RAM requirements
+# 4. Setting runtime RAM requirements
  
  By default, a step is allocated 256 MB of RAM.  From the STAR error message:
  
  
  By default, a step is allocated 256 MB of RAM.  From the STAR error message:
  
@@ -119,7 +117,7 @@ Resource requirements you can set include:
  
  After setting the RAM requirements, re-run the workflow.
  
  
  After setting the RAM requirements, re-run the workflow.
  
-### 5. Workflow results
+# 5. Workflow results
  
  The CWL runner will print a results JSON object to standard output.  It will look something like this (it may include additional fields).
  
  
  The CWL runner will print a results JSON object to standard output.  It will look something like this (it may include additional fields).
  
@@ -152,51 +150,3 @@ The CWL runner will print a results JSON object to standard output.  It will loo
  This has the same structure as `main-input.yaml`.  The each output
  parameter is listed, with the `location` field of each `File` object
  indicating where the output file can be found.
  This has the same structure as `main-input.yaml`.  The each output
  parameter is listed, with the `location` field of each `File` object
  indicating where the output file can be found.
-
-# Appendix
-
-## Downloading sample and reference data
-
-Start from your rnaseq-cwl-exercises directory.
-
-```
-mkdir rnaseq
-cd rnaseq
-wget --mirror --no-parent --no-host --cut-dirs=1 https://download.pirca.arvadosapi.com/c=9178fe1b80a08a422dbe02adfd439764+925/
-```
-
-## Downloading or generating STAR index
-
-Running STAR requires index files generated from the reference.
-
-This is a rather large download (4 GB).  Depending on your bandwidth, it may be faster to generate it yourself.
-
-### Downloading
-
-```
-mkdir hg19-chr1-STAR-index
-cd hg19-chr1-STAR-index
-wget --mirror --no-parent --no-host --cut-dirs=1 https://download.pirca.arvadosapi.com/c=02a12ce9e2707610991bd29d38796b57+2912/
-```
-
-### Generating
-
-Create `chr1-star-index.yaml`:
-
-```
-InputFiles:
-  - class: File
-    location: rnaseq/reference_data/chr1.fa
-    format: http://edamontology.org/format_1930
-IndexName: 'hg19-chr1-STAR-index'
-Gtf:
-  class: File
-  location: rnaseq/reference_data/chr1-hg19_genes.gtf
-Overhang: 99
-```
-
-Generate the index with your local cwl-runner.
-
-```
-cwl-runner bio-cwl-tools/STAR/STAR-Index.cwl chr1-star-index.yaml
-```
diff --git a/_episodes/04-commandlinetool.md b/_episodes/04-commandlinetool.md

index 0575a01561fd71919f991a11bf925592f31f5943..cae16826d6d11d8308e038fc1a7780be6d2b9a0f 100644 (file)
--- a/_episodes/04-commandlinetool.md
+++ b/_episodes/04-commandlinetool.md
@@ -14,7 +14,7 @@ It is time to add the last step in the analysis.
  
  This will use the "featureCounts" tool from the "subread" package.
  
  
  This will use the "featureCounts" tool from the "subread" package.
  
-### 1. File header
+# 1. File header
  
  Create a new file "featureCounts.cwl"
  
  
  Create a new file "featureCounts.cwl"
  
@@ -25,7 +25,7 @@ cwlVersion: v1.2
  class: CommandLineTool
  ```
  
  class: CommandLineTool
  ```
  
-### 2. Command line tool inputs
+# 2. Command line tool inputs
  
  A CommandLineTool describes a single invocation of a command line program.
  
  
  A CommandLineTool describes a single invocation of a command line program.
  
@@ -50,7 +50,7 @@ inputs:
    counts_input_bam: File
  ```
  
    counts_input_bam: File
  ```
  
-### 3. Specifying the program to run
+# 3. Specifying the program to run
  
  Give the name of the program to run in `baseCommand`.
  
  
  Give the name of the program to run in `baseCommand`.
  
@@ -58,7 +58,7 @@ Give the name of the program to run in `baseCommand`.
  baseCommand: featureCounts
  ```
  
  baseCommand: featureCounts
  ```
  
-### 4. Command arguments
+# 4. Command arguments
  
  The easiest way to describe the command line is with an `arguments`
  section.  This takes a comma-separated list of command line arguments.
  
  The easiest way to describe the command line is with an `arguments`
  section.  This takes a comma-separated list of command line arguments.
@@ -78,7 +78,7 @@ arguments: [-T, $(runtime.cores),
              $(inputs.counts_input_bam)]
  ```
  
              $(inputs.counts_input_bam)]
  ```
  
-### 5. Outputs section
+# 5. Outputs section
  
  In CWL, you must explicitly identify the outputs of a program.  This
  associates output parameters with specific files, and enables the
  
  In CWL, you must explicitly identify the outputs of a program.  This
  associates output parameters with specific files, and enables the
@@ -103,7 +103,7 @@ outputs:
        glob: featurecounts.tsv
  ```
  
        glob: featurecounts.tsv
  ```
  
-### 6. Running in a container
+# 6. Running in a container
  
  In order to run the tool, it needs to be installed.
  Using software containers, a tool can be pre-installed into a
  
  In order to run the tool, it needs to be installed.
  Using software containers, a tool can be pre-installed into a
@@ -133,7 +133,7 @@ hints:
      dockerPull: quay.io/biocontainers/subread:1.5.0p3--0
  ```
  
      dockerPull: quay.io/biocontainers/subread:1.5.0p3--0
  ```
  
-### 7. Running a tool on its own
+# 7. Running a tool on its own
  
  When creating a tool wrapper, it is helpful to run it on its own to test it.
  
  
  When creating a tool wrapper, it is helpful to run it on its own to test it.
  
@@ -157,7 +157,7 @@ The invocation is also the same:
  cwl-runner featureCounts.cwl featureCounts.yaml
  ```
  
  cwl-runner featureCounts.cwl featureCounts.yaml
  ```
  
-### 8. Adding it to the workflow
+# 8. Adding it to the workflow
  
  Now that we have confirmed that it works, we can add it to our workflow.
  We add it to `steps`, connecting the output of samtools to
  
  Now that we have confirmed that it works, we can add it to our workflow.
  We add it to `steps`, connecting the output of samtools to
diff --git a/_episodes/05-scatter.md b/_episodes/05-scatter.md

index 6160baeb27cec93a0be94030d08b52f57add9a3f..bc536727b9824a72446eeab60d43a716d9809b15 100644 (file)
--- a/_episodes/05-scatter.md
+++ b/_episodes/05-scatter.md
@@ -10,12 +10,10 @@ keypoints:
  - "First key point. Brief Answer to questions. (FIXME)"
  ---
  
  - "First key point. Brief Answer to questions. (FIXME)"
  ---
  
-# Analyzing multiple samples
-
  Analyzing a single sample is great, but in the real world you probably
  have a batch of samples that you need to analyze and then compare.
  
  Analyzing a single sample is great, but in the real world you probably
  have a batch of samples that you need to analyze and then compare.
  
-### 1. Subworkflows
+# 1. Subworkflows
  
  In addition to running command line tools, a workflow step can also
  execute another workflow.
  
  In addition to running command line tools, a workflow step can also
  execute another workflow.
@@ -60,7 +58,7 @@ requirements:
  If you run this workflow, you will get exactly the same results as
  before, we've just wrapped the inner workflow with an outer workflow.
  
  If you run this workflow, you will get exactly the same results as
  before, we've just wrapped the inner workflow with an outer workflow.
  
-### 2. Scattering
+# 2. Scattering
  
  The wrapper lets us do something useful.  We can modify the outer
  workflow to accept a list of files, and then invoke the inner workflow
  
  The wrapper lets us do something useful.  We can modify the outer
  workflow to accept a list of files, and then invoke the inner workflow
@@ -116,7 +114,7 @@ requirements:
    ScatterFeatureRequirement: {}
  ```
  
    ScatterFeatureRequirement: {}
  ```
  
-### 3. Running with list inputs
+# 3. Running with list inputs
  
  The `fq` parameter needs to be a list.  You write a list in yaml by
  starting each list item with a dash.  Example `main-input.yaml`
  
  The `fq` parameter needs to be a list.  You write a list in yaml by
  starting each list item with a dash.  Example `main-input.yaml`
@@ -151,7 +149,7 @@ gtf:
  
  Now you can run the workflow the same way as in Lesson 2.
  
  
  Now you can run the workflow the same way as in Lesson 2.
  
-### 4. Combining results
+# 4. Combining results
  
  Each instance of the alignment workflow produces its own featureCounts
  file.  However, to be able to compare results easily, we need them a
  
  Each instance of the alignment workflow produces its own featureCounts
  file.  However, to be able to compare results easily, we need them a
diff --git a/_episodes/06-expressions.md b/_episodes/06-expressions.md

index 7b83de6d28c831f5b6f124f39377c0705d7df7ac..54a5d32b9065a9534370521f3b7ad4ce46892d87 100644 (file)
--- a/_episodes/06-expressions.md
+++ b/_episodes/06-expressions.md
@@ -1,5 +1,5 @@
  ---
  ---
-title: "Dynamic Workflow behavior with expressions"
+title: "Dynamic workflows with expressions"
  teaching: 0
  exercises: 0
  questions:
  teaching: 0
  exercises: 0
  questions:
@@ -10,7 +10,7 @@ keypoints:
  - "First key point. Brief Answer to questions. (FIXME)"
  ---
  
  - "First key point. Brief Answer to questions. (FIXME)"
  ---
  
-### 1. Expressions on step inputs
+# 1. Expressions on step inputs
  
  You might have noticed that the output bam files are all named
  `Aligned.sortedByCoord.out.bam`.  This happens because because when we
  
  You might have noticed that the output bam files are all named
  `Aligned.sortedByCoord.out.bam`.  This happens because because when we
@@ -64,7 +64,7 @@ adds the remainder of the string, which just is a dot `.`.  This is to
  separate the leading part of our filename from the "Aligned.bam"
  extension that will be added by STAR.
  
  separate the leading part of our filename from the "Aligned.bam"
  extension that will be added by STAR.
  
-### 2. Organizing output files into Directories
+# 2. Organizing output files into Directories
  
  You probably noticed that all the output files appear in the same
  directory.  You might prefer that each file appears in its own
  
  You probably noticed that all the output files appear in the same
  directory.  You might prefer that each file appears in its own
diff --git a/_episodes/07-resources.md b/_episodes/07-resources.md

index 81fd2e1bd4915c67cb25c51f73ac67526480561f..0ac9e5f053d5de44887b2337493f03aa85fb8b1e 100644 (file)
--- a/_episodes/07-resources.md
+++ b/_episodes/07-resources.md
@@ -15,32 +15,31 @@ developing a CWL workflow. There are many resources out there to
  further help you use CWL to solve your own scientific workflow
  problems.
  
  further help you use CWL to solve your own scientific workflow
  problems.
  
-## CWL Reference
+# CWL Reference
  
  
-Main CWL web page https://commonwl.org
+[Main CWL web page](https://commonwl.org)
  
  
-User guide https://www.commonwl.org/user_guide/
+[User guide](https://www.commonwl.org/user_guide/)
  
  
-Specification https://www.commonwl.org/v1.2/
+[Specification](https://www.commonwl.org/v1.2/)
  
  
-Github organization https://github.com/common-workflow-language/
+[Github organization](https://github.com/common-workflow-language/)
  
  
-## CWL Community
+# CWL Community
  
  
-CWL Forum, this is is best place to ask questions https://cwl.discourse.group/
+The [CWL Forum](https://cwl.discourse.group/) is is best place to ask questions
  
  
-Gitter (chat) https://gitter.im/common-workflow-language/common-workflow-language
+[Gitter (chat)](https://gitter.im/common-workflow-language/common-workflow-language)
  
  
-Weekly video calls https://cwl.discourse.group/t/eu-us-timezone-cwl-video-chat/260
+[Weekly video calls](https://cwl.discourse.group/t/eu-us-timezone-cwl-video-chat/260)
  
  
-## Software resources
+# Software resources
  
  
-Github organization for repositories of CWL tool and workflow
-descriptions, including bio-cwl-tools
-https://github.com/common-workflow-library/
+Github organization for [repositories of CWL tool and workflow descriptions](https://github.com/common-workflow-library/),
+including [bio-cwl-tools](https://github.com/common-workflow-library/bio-cwl-tools).
  
  
-BioContainers https://biocontainers.pro/
+[BioContainers](https://biocontainers.pro/)
  
  
-Search for CWL files on github, try adding the name of a tool you are
-interested in to the search
-https://github.com/search?q=extension%3Acwl+cwlVersion
+[Search for CWL files](https://github.com/search?q=extension%3Acwl+cwlVersion) on
+Github, try adding the name of a tool you are interested in to the
+search
diff --git a/setup.md b/setup.md

index b8c50321d8b07f8a76f8e925416957c3f274012e..f907ec716ffe9f6e5bbf86725bf02b1ee7d8e631 100644 (file)
--- a/setup.md
+++ b/setup.md
@@ -1,7 +1,75 @@
  ---
  title: Setup
  ---
  ---
  title: Setup
  ---
-FIXME
+
+# Setting up a practice repository
+
+We will create a new git repository and import a library of existing
+tool definitions that will help us build our workflow.
+
+Create a new git repository to hold our workflow with this command:
+
+```
+git init rnaseq-cwl-training-exercises
+```
+
+On Arvados use this:
+
+```
+git clone https://github.com/arvados/arvados-vscode-cwl-template.git rnaseq-cwl-training-exercises
+```
+
+Next, import bio-cwl-tools with this command:
+
+```
+git submodule add https://github.com/common-workflow-library/bio-cwl-tools.git
+```
+
+# Downloading sample and reference data
+
+Start from your rnaseq-cwl-exercises directory.
+
+```
+mkdir rnaseq
+cd rnaseq
+wget --mirror --no-parent --no-host --cut-dirs=1 https://download.pirca.arvadosapi.com/c=9178fe1b80a08a422dbe02adfd439764+925/
+```
+
+# Downloading or generating STAR index
+
+Running STAR requires index files generated from the reference.
+
+This is a rather large download (4 GB).  Depending on your bandwidth, it may be faster to generate it yourself.
+
+## Downloading
+
+```
+mkdir hg19-chr1-STAR-index
+cd hg19-chr1-STAR-index
+wget --mirror --no-parent --no-host --cut-dirs=1 https://download.pirca.arvadosapi.com/c=02a12ce9e2707610991bd29d38796b57+2912/
+```
+
+## Generating
+
+Create `chr1-star-index.yaml`:
+
+```
+InputFiles:
+  - class: File
+    location: rnaseq/reference_data/chr1.fa
+    format: http://edamontology.org/format_1930
+IndexName: 'hg19-chr1-STAR-index'
+Gtf:
+  class: File
+  location: rnaseq/reference_data/chr1-hg19_genes.gtf
+Overhang: 99
+```
+
+Generate the index with your local cwl-runner.
+
+```
+cwl-runner bio-cwl-tools/STAR/STAR-Index.cwl chr1-star-index.yaml
+```
  
  
  {% include links.md %}
  
  
  {% include links.md %}
author	Peter Amstutz <peter.amstutz@curii.com>
	Tue, 26 Jan 2021 22:41:21 +0000 (17:41 -0500)
committer	Peter Amstutz <peter.amstutz@curii.com>
	Tue, 26 Jan 2021 22:41:21 +0000 (17:41 -0500)
_config.yml		patch \| blob \| history
_episodes/01-introduction.md		patch \| blob \| history
_episodes/02-workflow.md		patch \| blob \| history
_episodes/03-running.md		patch \| blob \| history
_episodes/04-commandlinetool.md		patch \| blob \| history
_episodes/05-scatter.md		patch \| blob \| history
_episodes/06-expressions.md		patch \| blob \| history
_episodes/07-resources.md		patch \| blob \| history
setup.md		patch \| blob \| history