Formatting

[rnaseq-cwl-training.git] / lesson1 / lesson1.md
diff --git a/lesson1/lesson1.md b/lesson1/lesson1.md

index 060597a226a1a71820bbbf9e62e8175e1a7e1597..f394c901f82c5583152839a31975ddfcf1eae3ac 100644 (file)
--- a/lesson1/lesson1.md
+++ b/lesson1/lesson1.md
@@ -1,8 +1,8 @@
-# Turning a bash script into a workflow using existing tools
+# Turning a shell script into a workflow using existing tools
  
  In this lesson we will turn `rnaseq_analysis_on_input_file.sh` into a workflow.
  
-# Setting up
+## Setting up
  
  We will create a new git repository and import a library of existing
  tool definitions that will help us build our workflow.
@@ -11,19 +11,16 @@ tool definitions that will help us build our workflow.
  
  2. Create a new git repository to hold our workflow with this command:
  
-## Arvados
-
  ```
-git clone https://github.com/arvados/arvados-vscode-cwl-template.git rnaseq-cwl-training-exercises
+git init rnaseq-cwl-training-exercises
  ```
  
-## Generic
+On Arvados use this:
  
  ```
-git init rnaseq-cwl-training-exercises
+git clone https://github.com/arvados/arvados-vscode-cwl-template.git rnaseq-cwl-training-exercises
  ```
  
-
  3. Go to File->Open Folder and select rnaseq-cwl-training-exercises
  
  4. Go to the terminal window
@@ -34,11 +31,13 @@ git init rnaseq-cwl-training-exercises
  git submodule add https://github.com/common-workflow-library/bio-cwl-tools.git
  ```
  
-# Writing the workflow
+## Writing the workflow
+
+### 1. File header
  
-1. Create a new file "main.cwl"
+Create a new file "main.cwl"
  
-2. Start with this header.
+Start with this header.
  
  
  ```
@@ -47,14 +46,25 @@ class: Workflow
  label: RNAseq CWL practice workflow
  ```
  
-3. Workflow Inputs
+### 2. Workflow Inputs
  
  The purpose of a workflow is to consume some input parameters, run a
  series of steps, and produce output values.
  
  For this analysis, the input parameters are the fastq file and the reference data required by STAR.
  
-In CWL, these are declared in the `inputs` section.
+In the original shell script, the following variables are declared:
+
+```
+# initialize a variable with an intuitive name to store the name of the input fastq file
+fq=$1
+
+# directory with genome reference FASTA and index files + name of the gene annotation file
+genome=rnaseq/reference_data
+gtf=rnaseq/reference_data/chr1-hg19_genes.gtf
+```
+
+In CWL, we will declare these variables in the `inputs` section.
  
  The inputs section lists each input parameter and its type.  Valid
  types include `File`, `Directory`, `string`, `boolean`, `int`, and
@@ -69,7 +79,7 @@ inputs:
    gtf: File
  ```
  
-4. Workflow Steps
+### 3. Workflow Steps
  
  A workflow consists of one or more steps.  This is the `steps` section.
  
@@ -101,10 +111,10 @@ steps:
      run: bio-cwl-tools/fastqc/fastqc_2.cwl
      in:
           reads_file: fq
-    out: [html_file, summary_file]
+    out: [html_file]
  ```
  
-5. Running alignment with STAR
+### 4. Running alignment with STAR
  
  STAR has more parameters.  Sometimes we want to provide input values
  to a step without making them as workflow-level inputs.  We can do
@@ -126,7 +136,7 @@ this with `{default: N}`
      out: [alignment]
  ```
  
-6. Running samtools
+### 5. Running samtools
  
  The third step is to generate an index for the aligned BAM.
  
@@ -145,7 +155,7 @@ step will not run until the `STAR` step has completed successfully.
      out: [bam_sorted_indexed]
  ```
  
-7. featureCounts
+### 6. featureCounts
  
  As of this writing, the `subread` package that provides
  `featureCounts` is not available in bio-cwl-tools (and if it has been
@@ -153,7 +163,7 @@ added since writing this, let's pretend that it isn't there.)  We will
  dive into how to write a CWL wrapper for a command line tool in
  lesson 2.  For now, we will leave off the final step.
  
-8. Workflow Outputs
+### 7. Workflow Outputs
  
  The last thing to do is declare the workflow outputs in the `outputs` section.
  
@@ -175,9 +185,6 @@ outputs:
    qc_html:
      type: File
      outputSource: fastqc/html_file
-  qc_summary:
-    type: File
-    outputSource: fastqc/summary_file
    bam_sorted_indexed:
      type: File
      outputSource: samtools/bam_sorted_indexed