_episodes/01-introduction.md

   1 ---
   2 title: "Introduction"
   3 teaching: 10
   4 exercises: 0
   5 questions:
   6 - "What is CWL?"
   7 - "What is the goal of this training?"
   8 objectives:
   9 - "Gain a high level understanding of the example analysis."
  10 keypoints:
  11 - "Common Workflow Language is a standard for describing data analysis workflows"
  12 - "We will use an bioinformatics RNA-seq analysis as an example workflow, but does not require in-depth knowledge of biology."
  13 - "After completing this training, you should be able to begin writing workflows for your own analysis, and know where to learn more."
  14 ---
  15
  16 # Introduction to Common Worklow Language
  17
  18 The Common Workflow Language (CWL) is an open standard for describing
  19 automated, batch data analysis workflows.  Unlike many programming
  20 languages, CWL is a declarative language.  This means it describes
  21 _what_ should happen, but not _how_ it should happen.  This enables
  22 workflows written in CWL to be portable and scalable across a variety
  23 of software and hardware environments, from workstations to cluster,
  24 cloud, and high performance computing (HPC) environments.  As a
  25 standard with multiple implementations, CWL is particularly well
  26 suited for research collaboration, publishing, and high-throughput
  27 production data analysis.
  28
  29 # Introduction to the example analysis
  30
  31 This training uses a bioinformatics RNA-seq analysis as a motivating
  32 example.  However, specific knowledge of the biology of RNA-seq is
  33 *not* required for these lessons.  For those unfamiliar with RNA-seq,
  34 it is the process of sequencing RNA present in a biological sample.
  35 From the sequence reads, we want to measure the relative numbers of
  36 different RNA molecules appearing in the sample that were produced by
  37 particular genes.  This analysis is called "differential gene
  38 expression".
  39
  40 The entire process looks like this:
  41
  42 ![](../assets/img/RNAseqWorkflow.png){: height="400px"}
  43
  44 For this training, we are only concerned with the middle analytical
  45 steps (skipping adapter trimming).
  46
  47 * Quality control (FASTQC)
  48 * Alignment (mapping)
  49 * Counting reads associated with genes
  50
  51 In this training, we do not develop the analysis from first
  52 principals, instead we we will be starting from an analysis already
  53 written as a shell script, which will be presented in lesson 2.
  54
  55 {% include links.md %}