X-Git-Url: https://git.arvados.org/rnaseq-cwl-training.git/blobdiff_plain/17e18678b6fbcc09220f597e55e7574dc9cae60a..3fbeb188954cdc09529323816e0cbb25ec1daa19:/_episodes/01-introduction.md diff --git a/_episodes/01-introduction.md b/_episodes/01-introduction.md index 81819f9..fe005cc 100644 --- a/_episodes/01-introduction.md +++ b/_episodes/01-introduction.md @@ -4,13 +4,11 @@ teaching: 10 exercises: 0 questions: - "What is CWL?" -- "What are the requirements for this training?" - "What is the goal of this training?" objectives: - "Understand how the training will be motivated by an example analysis." keypoints: - "Common Workflow Language is a standard for describing data analysis workflows" -- "This training assumes some basic familiarity with editing text files, the Unix command line, and Unix shell scripts." - "We will use an bioinformatics RNA-seq analysis as an example workflow, but does not require in-depth knowledge of biology." - "After completing this training, you should be able to begin writing workflows for your own analysis, and know where to learn more." --- @@ -28,40 +26,16 @@ standard with multiple implementations, CWL is particularly well suited for research collaboration, publishing, and high-throughput production data analysis. -# Introduction to this training - -The goal of this training is to walk the student through the -development of a best-practices CWL workflow, starting from an -existing shell script that performs a simple RNA-seq bioinformatics -analysis. At the conclusion of this training, you should have a grasp -of the essential components of a workflow, and have a basis for -learning more. - -This training assumes some basic familiarity with editing text files, -the Unix command line, and Unix shell scripts. - -Specific knowledge of the biology of RNA-seq is *not* a prerequisite -for these lessons. Although orignally developed to solve big data -problems in genomics, CWL is not domain specific to bioinformatics, -and is used in a number of other fields including medical imaging, -astronomy, geospatial, and machine learning. We hope that you will -find this training useful regardless of your area of research. - -These lessons are based on [Introduction to RNA-seq using -high-performance computing -(HPC)](https://github.com/hbctraining/Intro-to-rnaseq-hpc-O2) lessons -developed by members of the teaching team at the Harvard Chan -Bioinformatics Core (HBC). The original training, which includes -additional lectures about the biology of RNA-seq, can be found at that -link. - # Introduction to the example analysis -RNA-seq is the process of sequencing RNA present in a biological -sample. From the sequence reads, we want to measure the relative -numbers of different RNA molecules appearing in the sample that were -produced by particular genes. This analysis is called "differential -gene expression". +This training uses a bioinformatics RNA-seq analysis as a motivating +example. However, specific knowledge of the biology of RNA-seq is +*not* required for these lessons. For those unfamiliar with RNA-seq, +it is the process of sequencing RNA present in a biological sample. +From the sequence reads, we want to measure the relative numbers of +different RNA molecules appearing in the sample that were produced by +particular genes. This analysis is called "differential gene +expression". The entire process looks like this: @@ -74,8 +48,8 @@ steps (skipping adapter trimming). * Alignment (mapping) * Counting reads associated with genes -In this training, we are not attempting to develop the analysis from -scratch, instead we we will be starting from an analysis already -written in a shell script, which will be supplied in lesson 2. +In this training, we do not develop the analysis from first +principals, instead we we will be starting from an analysis already +written as a shell script, which will be presented in lesson 2. {% include links.md %}