From: Peter Amstutz <peter.amstutz@curii.com>
Date: Tue, 26 Jan 2021 19:45:32 +0000 (-0500)
Subject: Add more background to lesson 1.
X-Git-Url: https://git.arvados.org/rnaseq-cwl-training.git/commitdiff_plain/d177113e4a57ff8fe980212e51fe4317cf0fcb5a

Add more background to lesson 1.

Add answers to each section.

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>
---

diff --git a/README.md b/README.md
index 48ad847..510d494 100644
--- a/README.md
+++ b/README.md
@@ -5,7 +5,7 @@ rnaseq.
 
 | Lesson   | Description |
 |----------|-------------|
-| [Lesson 1](lesson1/lesson1.md) | Turning a shell script into a workflow from existing tool wrappers  |
+| [Lesson 1](lesson1/lesson1.md) | Turning a shell script into a workflow by composing existing tools  |
 | [Lesson 2](lesson2/lesson2.md) | Running and debugging a workflow  |
 | [Lesson 3](lesson3/lesson3.md) | Writing a tool wrapper  |
 | [Lesson 4](lesson4/lesson4.md) | Analyzing multiple samples  |
diff --git a/lesson1/RNAseqWorkflow.png b/lesson1/RNAseqWorkflow.png
new file mode 100644
index 0000000..1878db4
Binary files /dev/null and b/lesson1/RNAseqWorkflow.png differ
diff --git a/lesson1/answers/main.cwl b/lesson1/answers/main.cwl
new file mode 100644
index 0000000..bad27f4
--- /dev/null
+++ b/lesson1/answers/main.cwl
@@ -0,0 +1,48 @@
+### 1. File header
+cwlVersion: v1.2
+class: Workflow
+label: RNAseq CWL practice workflow
+
+### 2. Workflow Inputs
+inputs:
+  fq: File
+  genome: Directory
+  gtf: File
+
+### 3. Workflow Steps
+steps:
+  fastqc:
+    run: bio-cwl-tools/fastqc/fastqc_2.cwl
+    in:
+      reads_file: fq
+    out: [html_file]
+
+  ### 4. Running alignment with STAR
+  STAR:
+    requirements:
+      ResourceRequirement:
+        ramMin: 6000
+    run: bio-cwl-tools/STAR/STAR-Align.cwl
+    in:
+      RunThreadN: {default: 4}
+      GenomeDir: genome
+      ForwardReads: fq
+      OutSAMtype: {default: BAM}
+      OutSAMunmapped: {default: Within}
+    out: [alignment]
+
+  ### 5. Running samtools
+  samtools:
+    run: bio-cwl-tools/samtools/samtools_index.cwl
+    in:
+      bam_sorted: STAR/alignment
+    out: [bam_sorted_indexed]
+
+### 7. Workflow Outputs
+outputs:
+  qc_html:
+    type: File
+    outputSource: fastqc/html_file
+  bam_sorted_indexed:
+    type: File
+    outputSource: samtools/bam_sorted_indexed
diff --git a/lesson1/lesson1.md b/lesson1/lesson1.md
index 8677c10..5c45265 100644
--- a/lesson1/lesson1.md
+++ b/lesson1/lesson1.md
@@ -1,31 +1,47 @@
-# Turning a shell script into a workflow using existing tools
+# Turning a shell script into a workflow by composing existing tools
 
-In this lesson we will turn `rnaseq_analysis_on_input_file.sh` into a workflow.
+## Introduction
 
-## Setting up
+The goal of this training is to walk through the development of a
+best-practices CWL workflow by translating an existing bioinformatics
+shell script into CWL.  Specific knowledge of the biology of RNA-seq
+is *not* a prerequisite for these lessons.
 
-We will create a new git repository and import a library of existing
-tool definitions that will help us build our workflow.
+These lessons are based on "Introduction to RNA-seq using
+high-performance computing (HPC)" lessons developed by members of the
+teaching team at the Harvard Chan Bioinformatics Core (HBC).  The
+original training, which includes additional lectures about the
+biology of RNA-seq can be found here:
 
-Create a new git repository to hold our workflow with this command:
+https://github.com/hbctraining/Intro-to-rnaseq-hpc-O2
 
-```
-git init rnaseq-cwl-training-exercises
-```
+## Background
 
-On Arvados use this:
+RNA-seq is the process of sequencing RNA in a biological sample.  From
+the sequence reads, we want to measure the relative number of RNA
+molecules appearing in the sample that were produced by particular
+genes.  This analysis is called "differential gene expression".
 
-```
-git clone https://github.com/arvados/arvados-vscode-cwl-template.git rnaseq-cwl-training-exercises
-```
+The entire process looks like this:
 
-Next, import bio-cwl-tools with this command:
+![](RNAseqWorkflow.png)
 
-```
-git submodule add https://github.com/common-workflow-library/bio-cwl-tools.git
-```
+For this training, we are only concerned with the middle analytical
+steps (skipping adapter trimming).
+
+* Quality control (FASTQC)
+* Alignment (mapping)
+* Counting reads associated with genes
 
-## The shell script
+## Analysis shell script
+
+This analysis is already available as a Unix shell script, which we
+will refer to in order to build the workflow.
+
+Some of the reasons to use CWL over a plain shell script: portability,
+scalability, ability to run on platforms that are not traditional HPC.
+
+rnaseq_analysis_on_input_file.sh
 
 ```
 #!/bin/bash
@@ -81,6 +97,29 @@ samtools index $counts_input_bam
 featureCounts -T $cores -s 2 -a $gtf -o $counts $counts_input_bam
 ```
 
+## Setting up
+
+We will create a new git repository and import a library of existing
+tool definitions that will help us build our workflow.
+
+Create a new git repository to hold our workflow with this command:
+
+```
+git init rnaseq-cwl-training-exercises
+```
+
+On Arvados use this:
+
+```
+git clone https://github.com/arvados/arvados-vscode-cwl-template.git rnaseq-cwl-training-exercises
+```
+
+Next, import bio-cwl-tools with this command:
+
+```
+git submodule add https://github.com/common-workflow-library/bio-cwl-tools.git
+```
+
 ## Writing the workflow
 
 ### 1. File header
@@ -160,7 +199,7 @@ steps:
   fastqc:
     run: bio-cwl-tools/fastqc/fastqc_2.cwl
     in:
-	  reads_file: fq
+      reads_file: fq
     out: [html_file]
 ```
 
diff --git a/lesson3/answers/featureCounts.cwl b/lesson3/answers/featureCounts.cwl
new file mode 100644
index 0000000..9653391
--- /dev/null
+++ b/lesson3/answers/featureCounts.cwl
@@ -0,0 +1,29 @@
+### 1. File header
+cwlVersion: v1.2
+class: CommandLineTool
+
+### 2. Command line tool inputs
+inputs:
+  gtf: File
+  counts_input_bam: File
+
+### 3. Specifying the program to run
+baseCommand: featureCounts
+
+### 4. Command arguments
+arguments: [-T, $(runtime.cores),
+            -a, $(inputs.gtf),
+            -o, featurecounts.tsv,
+            $(inputs.counts_input_bam)]
+
+### 5. Outputs section
+outputs:
+  featurecounts:
+    type: File
+      outputBinding:
+      glob: featurecounts.tsv
+
+### 6. Running in a container
+hints:
+  DockerRequirement:
+    dockerPull: quay.io/biocontainers/subread:1.5.0p3--0
diff --git a/lesson3/answers/main.cwl b/lesson3/answers/main.cwl
new file mode 100644
index 0000000..7eaf62e
--- /dev/null
+++ b/lesson3/answers/main.cwl
@@ -0,0 +1,58 @@
+cwlVersion: v1.2
+class: Workflow
+label: RNAseq CWL practice workflow
+
+inputs:
+  fq: File
+  genome: Directory
+  gtf: File
+
+steps:
+  fastqc:
+    run: bio-cwl-tools/fastqc/fastqc_2.cwl
+    in:
+      reads_file: fq
+    out: [html_file]
+
+  STAR:
+    requirements:
+      ResourceRequirement:
+        ramMin: 6000
+    run: bio-cwl-tools/STAR/STAR-Align.cwl
+    in:
+      RunThreadN: {default: 4}
+      GenomeDir: genome
+      ForwardReads: fq
+      OutSAMtype: {default: BAM}
+      OutSAMunmapped: {default: Within}
+    out: [alignment]
+
+  samtools:
+    run: bio-cwl-tools/samtools/samtools_index.cwl
+    in:
+      bam_sorted: STAR/alignment
+    out: [bam_sorted_indexed]
+
+  ### 8. Adding it to the workflow
+  featureCounts:
+    requirements:
+      ResourceRequirement:
+        ramMin: 500
+    run: featureCounts.cwl
+      in:
+        counts_input_bam: samtools/bam_sorted_indexed
+        gtf: gtf
+      out: [featurecounts]
+
+outputs:
+  qc_html:
+    type: File
+    outputSource: fastqc/html_file
+  bam_sorted_indexed:
+    type: File
+    outputSource: samtools/bam_sorted_indexed
+
+  ### 8. Adding it to the workflow
+  featurecounts:
+    type: File
+      outputSource: featureCounts/featurecounts
diff --git a/lesson3/lesson3.md b/lesson3/lesson3.md
index 622f446..9bd5a70 100644
--- a/lesson3/lesson3.md
+++ b/lesson3/lesson3.md
@@ -64,8 +64,8 @@ describes the resources allocated to running the program.  Here we use
 ```
 arguments: [-T, $(runtime.cores),
             -a, $(inputs.gtf),
-			-o, featurecounts.tsv,
-			$(inputs.counts_input_bam)]
+            -o, featurecounts.tsv,
+            $(inputs.counts_input_bam)]
 ```
 
 ### 5. Outputs section
@@ -89,8 +89,8 @@ output directory called `featurecounts.tsv`
 outputs:
   featurecounts:
     type: File
-	outputBinding:
-	  glob: featurecounts.tsv
+      outputBinding:
+      glob: featurecounts.tsv
 ```
 
 ### 6. Running in a container
@@ -162,10 +162,10 @@ steps:
       ResourceRequirement:
         ramMin: 500
     run: featureCounts.cwl
-	in:
-      counts_input_bam: samtools/bam_sorted_indexed
-	  gtf: gtf
-	out: [featurecounts]
+      in:
+        counts_input_bam: samtools/bam_sorted_indexed
+        gtf: gtf
+      out: [featurecounts]
 ```
 
 We will add the result from featurecounts to the output:
@@ -175,8 +175,7 @@ outputs:
   ...
   featurecounts:
     type: File
-	outputSource: featureCounts/featurecounts
-
+      outputSource: featureCounts/featurecounts
 ```
 
 You should now be able to re-run the workflow and it will run the
diff --git a/lesson4/answers/part1/alignment.cwl b/lesson4/answers/part1/alignment.cwl
new file mode 100644
index 0000000..c46b568
--- /dev/null
+++ b/lesson4/answers/part1/alignment.cwl
@@ -0,0 +1,56 @@
+cwlVersion: v1.2
+class: Workflow
+label: RNAseq CWL practice workflow
+
+inputs:
+  fq: File
+  genome: Directory
+  gtf: File
+
+steps:
+  fastqc:
+    run: bio-cwl-tools/fastqc/fastqc_2.cwl
+    in:
+      reads_file: fq
+    out: [html_file]
+
+  STAR:
+    requirements:
+      ResourceRequirement:
+        ramMin: 6000
+    run: bio-cwl-tools/STAR/STAR-Align.cwl
+    in:
+      RunThreadN: {default: 4}
+      GenomeDir: genome
+      ForwardReads: fq
+      OutSAMtype: {default: BAM}
+      OutSAMunmapped: {default: Within}
+    out: [alignment]
+
+  samtools:
+    run: bio-cwl-tools/samtools/samtools_index.cwl
+    in:
+      bam_sorted: STAR/alignment
+    out: [bam_sorted_indexed]
+
+  featureCounts:
+    requirements:
+      ResourceRequirement:
+        ramMin: 500
+    run: featureCounts.cwl
+      in:
+        counts_input_bam: samtools/bam_sorted_indexed
+        gtf: gtf
+      out: [featurecounts]
+
+outputs:
+  qc_html:
+    type: File
+    outputSource: fastqc/html_file
+  bam_sorted_indexed:
+    type: File
+    outputSource: samtools/bam_sorted_indexed
+
+  featurecounts:
+    type: File
+      outputSource: featureCounts/featurecounts
diff --git a/lesson4/answers/part1/featureCounts.cwl b/lesson4/answers/part1/featureCounts.cwl
new file mode 100644
index 0000000..4407ec9
--- /dev/null
+++ b/lesson4/answers/part1/featureCounts.cwl
@@ -0,0 +1,23 @@
+cwlVersion: v1.2
+class: CommandLineTool
+
+inputs:
+  gtf: File
+  counts_input_bam: File
+
+baseCommand: featureCounts
+
+arguments: [-T, $(runtime.cores),
+            -a, $(inputs.gtf),
+            -o, featurecounts.tsv,
+            $(inputs.counts_input_bam)]
+
+outputs:
+  featurecounts:
+    type: File
+      outputBinding:
+      glob: featurecounts.tsv
+
+hints:
+  DockerRequirement:
+    dockerPull: quay.io/biocontainers/subread:1.5.0p3--0
diff --git a/lesson4/answers/part1/main.cwl b/lesson4/answers/part1/main.cwl
new file mode 100644
index 0000000..33e0f05
--- /dev/null
+++ b/lesson4/answers/part1/main.cwl
@@ -0,0 +1,32 @@
+cwlVersion: v1.2
+class: Workflow
+label: RNAseq CWL practice workflow
+
+inputs:
+  fq: File
+  genome: Directory
+  gtf: File
+
+### 1. Subworkflows
+steps:
+  alignment:
+    run: alignment.cwl
+    in:
+      fq: fq
+      genome: genome
+      gtf: gtf
+    out: [qc_html, bam_sorted_indexed, featurecounts]
+
+outputs:
+  qc_html:
+    type: File
+    outputSource: alignment/qc_html
+  bam_sorted_indexed:
+    type: File
+    outputSource: alignment/bam_sorted_indexed
+  featurecounts:
+    type: File
+    outputSource: alignment/featurecounts
+
+requirements:
+  SubworkflowFeatureRequirement: {}
diff --git a/lesson4/answers/part2/alignment.cwl b/lesson4/answers/part2/alignment.cwl
new file mode 100644
index 0000000..c46b568
--- /dev/null
+++ b/lesson4/answers/part2/alignment.cwl
@@ -0,0 +1,56 @@
+cwlVersion: v1.2
+class: Workflow
+label: RNAseq CWL practice workflow
+
+inputs:
+  fq: File
+  genome: Directory
+  gtf: File
+
+steps:
+  fastqc:
+    run: bio-cwl-tools/fastqc/fastqc_2.cwl
+    in:
+      reads_file: fq
+    out: [html_file]
+
+  STAR:
+    requirements:
+      ResourceRequirement:
+        ramMin: 6000
+    run: bio-cwl-tools/STAR/STAR-Align.cwl
+    in:
+      RunThreadN: {default: 4}
+      GenomeDir: genome
+      ForwardReads: fq
+      OutSAMtype: {default: BAM}
+      OutSAMunmapped: {default: Within}
+    out: [alignment]
+
+  samtools:
+    run: bio-cwl-tools/samtools/samtools_index.cwl
+    in:
+      bam_sorted: STAR/alignment
+    out: [bam_sorted_indexed]
+
+  featureCounts:
+    requirements:
+      ResourceRequirement:
+        ramMin: 500
+    run: featureCounts.cwl
+      in:
+        counts_input_bam: samtools/bam_sorted_indexed
+        gtf: gtf
+      out: [featurecounts]
+
+outputs:
+  qc_html:
+    type: File
+    outputSource: fastqc/html_file
+  bam_sorted_indexed:
+    type: File
+    outputSource: samtools/bam_sorted_indexed
+
+  featurecounts:
+    type: File
+      outputSource: featureCounts/featurecounts
diff --git a/lesson4/answers/part2/featureCounts.cwl b/lesson4/answers/part2/featureCounts.cwl
new file mode 100644
index 0000000..4407ec9
--- /dev/null
+++ b/lesson4/answers/part2/featureCounts.cwl
@@ -0,0 +1,23 @@
+cwlVersion: v1.2
+class: CommandLineTool
+
+inputs:
+  gtf: File
+  counts_input_bam: File
+
+baseCommand: featureCounts
+
+arguments: [-T, $(runtime.cores),
+            -a, $(inputs.gtf),
+            -o, featurecounts.tsv,
+            $(inputs.counts_input_bam)]
+
+outputs:
+  featurecounts:
+    type: File
+      outputBinding:
+      glob: featurecounts.tsv
+
+hints:
+  DockerRequirement:
+    dockerPull: quay.io/biocontainers/subread:1.5.0p3--0
diff --git a/lesson4/answers/part2/main.cwl b/lesson4/answers/part2/main.cwl
new file mode 100644
index 0000000..9abc5a9
--- /dev/null
+++ b/lesson4/answers/part2/main.cwl
@@ -0,0 +1,34 @@
+cwlVersion: v1.2
+class: Workflow
+label: RNAseq CWL practice workflow
+
+### 2. Scattering
+inputs:
+  fq: File[]
+  genome: Directory
+  gtf: File
+
+steps:
+  alignment:
+    run: alignment.cwl
+    scatter: fq
+    in:
+      fq: fq
+      genome: genome
+      gtf: gtf
+    out: [qc_html, bam_sorted_indexed, featurecounts]
+
+outputs:
+  qc_html:
+    type: File[]
+    outputSource: alignment/qc_html
+  bam_sorted_indexed:
+    type: File[]
+    outputSource: alignment/bam_sorted_indexed
+  featurecounts:
+    type: File[]
+    outputSource: alignment/featurecounts
+
+requirements:
+  SubworkflowFeatureRequirement: {}
+  ScatterFeatureRequirement: {}
diff --git a/lesson4/answers/part4/alignment.cwl b/lesson4/answers/part4/alignment.cwl
new file mode 100644
index 0000000..df31e9b
--- /dev/null
+++ b/lesson4/answers/part4/alignment.cwl
@@ -0,0 +1,42 @@
+cwlVersion: v1.2
+class: Workflow
+label: RNAseq CWL practice workflow
+
+inputs:
+  fq: File
+  genome: Directory
+  gtf: File
+
+steps:
+  fastqc:
+    run: bio-cwl-tools/fastqc/fastqc_2.cwl
+    in:
+      reads_file: fq
+    out: [html_file]
+
+  STAR:
+    requirements:
+      ResourceRequirement:
+        ramMin: 6000
+    run: bio-cwl-tools/STAR/STAR-Align.cwl
+    in:
+      RunThreadN: {default: 4}
+      GenomeDir: genome
+      ForwardReads: fq
+      OutSAMtype: {default: BAM}
+      OutSAMunmapped: {default: Within}
+    out: [alignment]
+
+  samtools:
+    run: bio-cwl-tools/samtools/samtools_index.cwl
+    in:
+      bam_sorted: STAR/alignment
+    out: [bam_sorted_indexed]
+
+outputs:
+  qc_html:
+    type: File
+    outputSource: fastqc/html_file
+  bam_sorted_indexed:
+    type: File
+    outputSource: samtools/bam_sorted_indexed
diff --git a/lesson4/answers/part4/featureCounts.cwl b/lesson4/answers/part4/featureCounts.cwl
new file mode 100644
index 0000000..38ace83
--- /dev/null
+++ b/lesson4/answers/part4/featureCounts.cwl
@@ -0,0 +1,26 @@
+cwlVersion: v1.2
+class: CommandLineTool
+
+### 4. Combining results
+inputs:
+  gtf: File
+  counts_input_bam:
+   - File
+   - File[]
+
+baseCommand: featureCounts
+
+arguments: [-T, $(runtime.cores),
+            -a, $(inputs.gtf),
+            -o, featurecounts.tsv,
+            $(inputs.counts_input_bam)]
+
+outputs:
+  featurecounts:
+    type: File
+      outputBinding:
+      glob: featurecounts.tsv
+
+hints:
+  DockerRequirement:
+    dockerPull: quay.io/biocontainers/subread:1.5.0p3--0
diff --git a/lesson4/answers/part4/main.cwl b/lesson4/answers/part4/main.cwl
new file mode 100644
index 0000000..fcbb235
--- /dev/null
+++ b/lesson4/answers/part4/main.cwl
@@ -0,0 +1,47 @@
+cwlVersion: v1.2
+class: Workflow
+label: RNAseq CWL practice workflow
+
+### 2. Scattering
+inputs:
+  fq: File[]
+  genome: Directory
+  gtf: File
+
+steps:
+  alignment:
+    run: alignment.cwl
+    scatter: fq
+    in:
+      fq: fq
+      genome: genome
+      gtf: gtf
+    out: [qc_html, bam_sorted_indexed, featurecounts]
+
+  ### 4. Combining results
+  featureCounts:
+    requirements:
+      ResourceRequirement:
+        ramMin: 500
+    run: featureCounts.cwl
+    in:
+      counts_input_bam: alignment/bam_sorted_indexed
+      gtf: gtf
+    out: [featurecounts]
+
+outputs:
+  qc_html:
+    type: File[]
+    outputSource: alignment/qc_html
+  bam_sorted_indexed:
+    type: File[]
+    outputSource: alignment/bam_sorted_indexed
+
+  ### 4. Combining results
+  featurecounts:
+    type: File
+    outputSource: featureCounts/featurecounts
+
+requirements:
+  SubworkflowFeatureRequirement: {}
+  ScatterFeatureRequirement: {}
diff --git a/lesson4/lesson4.md b/lesson4/lesson4.md
index 6aa45de..91df9bd 100644
--- a/lesson4/lesson4.md
+++ b/lesson4/lesson4.md
@@ -17,10 +17,10 @@ steps:
   alignment:
     run: alignment.cwl
     in:
-	  fq: fq
-	  genome: genome
-	  gtf: gtf
-	out: [qc_html, bam_sorted_indexed, featurecounts]
+      fq: fq
+      genome: genome
+      gtf: gtf
+    out: [qc_html, bam_sorted_indexed, featurecounts]
 ```
 
 In the outputs section, all the output sources are from the alignment step:
@@ -71,12 +71,12 @@ run `alignment.cwl` for each value in the list in the `fq` parameter.
 steps:
   alignment:
     run: alignment.cwl
-	scatter: fq
+    scatter: fq
     in:
-	  fq: fq
-	  genome: genome
-	  gtf: gtf
-	out: [qc_html, bam_sorted_indexed, featurecounts]
+      fq: fq
+      genome: genome
+      gtf: gtf
+    out: [qc_html, bam_sorted_indexed, featurecounts]
 ```
 
 Because the scatter produces multiple outputs, each output parameter
diff --git a/lesson5/answers/alignment.cwl b/lesson5/answers/alignment.cwl
new file mode 100644
index 0000000..8a54fe4
--- /dev/null
+++ b/lesson5/answers/alignment.cwl
@@ -0,0 +1,47 @@
+cwlVersion: v1.2
+class: Workflow
+label: RNAseq CWL practice workflow
+
+inputs:
+  fq: File
+  genome: Directory
+  gtf: File
+
+requirements:
+  StepInputExpressionRequirement: {}
+
+steps:
+  fastqc:
+    run: bio-cwl-tools/fastqc/fastqc_2.cwl
+    in:
+      reads_file: fq
+    out: [html_file]
+
+  STAR:
+    requirements:
+      ResourceRequirement:
+        ramMin: 6000
+    run: bio-cwl-tools/STAR/STAR-Align.cwl
+    in:
+      RunThreadN: {default: 4}
+      GenomeDir: genome
+      ForwardReads: fq
+      OutSAMtype: {default: BAM}
+      OutSAMunmapped: {default: Within}
+      ### 1. Expressions on step inputs
+      OutFileNamePrefix: {valueFrom: "$(inputs.ForwardReads.nameroot)."}
+    out: [alignment]
+
+  samtools:
+    run: bio-cwl-tools/samtools/samtools_index.cwl
+    in:
+      bam_sorted: STAR/alignment
+    out: [bam_sorted_indexed]
+
+outputs:
+  qc_html:
+    type: File
+    outputSource: fastqc/html_file
+  bam_sorted_indexed:
+    type: File
+    outputSource: samtools/bam_sorted_indexed
diff --git a/lesson5/answers/featureCounts.cwl b/lesson5/answers/featureCounts.cwl
new file mode 100644
index 0000000..681697e
--- /dev/null
+++ b/lesson5/answers/featureCounts.cwl
@@ -0,0 +1,25 @@
+cwlVersion: v1.2
+class: CommandLineTool
+
+inputs:
+  gtf: File
+  counts_input_bam:
+   - File
+   - File[]
+
+baseCommand: featureCounts
+
+arguments: [-T, $(runtime.cores),
+            -a, $(inputs.gtf),
+            -o, featurecounts.tsv,
+            $(inputs.counts_input_bam)]
+
+outputs:
+  featurecounts:
+    type: File
+      outputBinding:
+      glob: featurecounts.tsv
+
+hints:
+  DockerRequirement:
+    dockerPull: quay.io/biocontainers/subread:1.5.0p3--0
diff --git a/lesson5/answers/main.cwl b/lesson5/answers/main.cwl
new file mode 100644
index 0000000..e934079
--- /dev/null
+++ b/lesson5/answers/main.cwl
@@ -0,0 +1,50 @@
+cwlVersion: v1.2
+class: Workflow
+label: RNAseq CWL practice workflow
+
+inputs:
+  fq: File[]
+  genome: Directory
+  gtf: File
+
+steps:
+  alignment:
+    run: alignment.cwl
+    scatter: fq
+    in:
+      fq: fq
+      genome: genome
+      gtf: gtf
+    out: [qc_html, bam_sorted_indexed, featurecounts]
+
+  featureCounts:
+    requirements:
+      ResourceRequirement:
+        ramMin: 500
+    run: featureCounts.cwl
+    in:
+      counts_input_bam: alignment/bam_sorted_indexed
+      gtf: gtf
+    out: [featurecounts]
+
+  ### 2. Organizing output files into Directories
+  output-subdirs:
+    run: subdirs.cwl
+    in:
+      fq: fq
+      bams: alignment/bam_sorted_indexed
+      qc: alignment/qc_html
+    out: [dirs]
+
+outputs:
+  dirs:
+    type: Directory[]
+    outputSource: output-subdirs/dirs
+
+  featurecounts:
+    type: File
+    outputSource: featureCounts/featurecounts
+
+requirements:
+  SubworkflowFeatureRequirement: {}
+  ScatterFeatureRequirement: {}
diff --git a/lesson5/answers/subdirs.cwl b/lesson5/answers/subdirs.cwl
new file mode 100644
index 0000000..fc4fe7d
--- /dev/null
+++ b/lesson5/answers/subdirs.cwl
@@ -0,0 +1,22 @@
+cwlVersion: v1.2
+class: ExpressionTool
+requirements:
+  InlineJavascriptRequirement: {}
+inputs:
+  fq: File[]
+  bams: File[]
+  qc: File[]
+outputs:
+  dirs: Directory[]
+expression: |-
+  ${
+  var dirs = [];
+  for (var i = 0; i < inputs.bams.length; i++) {
+    dirs.push({
+      "class": "Directory",
+      "basename": inputs.fq[i].nameroot,
+      "listing": [inputs.bams[i], inputs.qc[i]]
+    });
+  }
+  return {"dirs": dirs};
+  }
diff --git a/lesson5/lesson5.md b/lesson5/lesson5.md
index 4567620..6640176 100644
--- a/lesson5/lesson5.md
+++ b/lesson5/lesson5.md
@@ -22,6 +22,7 @@ filename.
 ```
 requirements:
   StepInputExpressionRequirement: {}
+
 steps:
   ...
   STAR: