---
-title: "Writing a tool wrapper"
-teaching: 0
-exercises: 0
+title: "Writing a Tool Wrapper"
+teaching: 20
+exercises: 30
questions:
-- "Key question (FIXME)"
+- "What are the key components of a tool wrapper?"
+- "How do I use software containers to supply the software I want to run?"
objectives:
-- "First learning objective. (FIXME)"
+- "Write a tool wrapper for the featureCounts tool."
+- "Find an software container that has the software we want to use."
+- "Add the tool wrapper to our main workflow."
keypoints:
-- "First key point. Brief Answer to questions. (FIXME)"
+- "The key components of a command line tool wrapper are the header, inputs, baseCommand, arguments, and outputs."
+- "Like workflows, CommandLineTools have `inputs` and `outputs`."
+- "Use `baseCommand` and `arguments` to provide the program to run and the command line arguments to run it with."
+- "Use `glob` to capture output files and assign them to output parameters."
+- "Use DockerRequirement to supply the name of the Docker image that contains the software to run."
---
It is time to add the last step in the analysis.
-This will use the "featureCounts" tool from the "subread" package.
-
-# 1. File header
-
-Create a new file "featureCounts.cwl"
-
-Start with this header
-
```
-cwlVersion: v1.2
-class: CommandLineTool
+# Count mapped reads
+featureCounts -T $cores -s 2 -a $gtf -o $counts $counts_input_bam
```
+{: .language-bash }
-# 2. Command line tool inputs
-
-A CommandLineTool describes a single invocation of a command line program.
+This will use the "featureCounts" tool from the "subread" package.
-It consumes some input parameters, runs a program, and produce output
-values.
+# File header
-Here is the original shell command:
+A CommandLineTool describes a single invocation of a command line
+program. It consumes some input parameters, runs a program, and
+captures output, mainly in in the form of files produced by the
+program.
-```
-featureCounts -T $cores -s 2 -a $gtf -o $counts $counts_input_bam
-```
-
-The variables used in the bash script are `$cores`, `$gtf`, `$counts` and `$counts_input_bam`.
-
-The parameters
+Create a new file "featureCounts.cwl"
-This gives us two file inputs, `gtf` and `counts_input_bam` which we can declare in our `inputs` section:
+Let's start with the header. This is very similar to the workflow, except that we use `class: CommandLineTool`.
```
-inputs:
- gtf: File
- counts_input_bam: File
+cwlVersion: v1.2
+class: CommandLineTool
+label: featureCounts tool
```
-
-# 3. Specifying the program to run
+{: .language-yaml }
+
+# Command line tool inputs
+
+The `inputs` section describes input parameters with the same form as
+the Workflow `inputs` section.
+
+> ## Exercise
+>
+> The variables used in the bash script are `$cores`, `$gtf`, `$counts` and `$counts_input_bam`.
+>
+> * $cores is the number of CPU cores to use.
+> * $gtf is the input .gtf file
+> * $counts is the name we will give to the output file
+> * $counts_input_bam is the input .bam file
+>
+> Write the `inputs` section for the File inputs `gtf` and `counts_input_bam`.
+>
+> > ## Solution
+> > ```
+> > inputs:
+> > gtf: File
+> > counts_input_bam: File
+> > ```
+> > {: .language-yaml }
+> {: .solution}
+{: .challenge}
+
+# Specifying the program to run
Give the name of the program to run in `baseCommand`.
```
baseCommand: featureCounts
```
+{: .language-yaml }
-# 4. Command arguments
+# Command arguments
The easiest way to describe the command line is with an `arguments`
section. This takes a comma-separated list of command line arguments.
-Input variables are included on the command line as
-`$(inputs.name_of_parameter)`. When the tool is executed, these input
-parameter values are substituted for these variable.
-
-Special variables are also available. The runtime environment
-describes the resources allocated to running the program. Here we use
-`$(runtime.cores)` to decide how many threads to request.
```
arguments: [-T, $(runtime.cores),
-o, featurecounts.tsv,
$(inputs.counts_input_bam)]
```
+{: .language-yaml }
+
+Input variables are included on the command line as
+`$(inputs.name_of_parameter)`. When the tool is executed, the
+variables will be replaced with the input parameter values.
-# 5. Outputs section
+There are also some special variables. The `runtime` object describes
+the resources allocated to running the program. Here we use
+`$(runtime.cores)` to decide how many threads to request.
+
+> ## `arguments` vs `inputBinding`
+>
+> You may recall from examining existing the fastqc and STAR tools
+> wrappers in lesson 2, another way to express command line parameters
+> is with `inputBinding` and `prefix` on individual input parameters.
+>
+> ```
+> inputs:
+> parametername:
+> type: parametertype
+> inputBinding:
+> prefix: --some-option
+> ```
+> {: .language-yaml }
+>
+> We use `arguments` in the example simply because it is easier to see
+> how it lines up with the source shell script.
+>
+> You can use both `inputBinding` and `arguments` in the same
+> CommandLineTool document. There is no "right" or "wrong" way, and
+> one does not override the other, they are combined to produce the
+> final command line invocation.
+>
+{: .callout}
+
+# Outputs section
In CWL, you must explicitly identify the outputs of a program. This
associates output parameters with specific files, and enables the
outputBinding:
glob: featurecounts.tsv
```
+{: .language-yaml }
-# 6. Running in a container
+# Running in a container
In order to run the tool, it needs to be installed.
Using software containers, a tool can be pre-installed into a
compatible runtime environment, and that runtime environment (called a
container image) can be downloaded and run on demand.
-Many bioinformatics tools are already available as containers. One
-resource is the BioContainers project. Let's find the "subread" software:
-
- 1. Visit https://biocontainers.pro/
- 2. Click on "Registry"
- 3. Search for "subread"
- 4. Click on the search result for "subread"
- 5. Click on the tab "Packages and Containers"
- 6. Choose a row with type "docker", then on the right side of the "Full
-Tag" column for that row, click the "copy to clipboard" button.
-
-To declare that you want to run inside a container, create a section
-called `hints` with a subsection `DockerRequirement`. Under
-`DockerRequirement`, paste the text your copied in the above step.
-Replace the text `docker pull` to `dockerPull:` and indent it so it is
-in the `DockerRequirement` section.
-
-```
-hints:
- DockerRequirement:
- dockerPull: quay.io/biocontainers/subread:1.5.0p3--0
-```
-
-# 7. Running a tool on its own
+Although plain CWL does not _require_ the use of containers, many
+popular platforms that run CWL do require the software be supplied in
+the form of a container image.
+
+> ## Finding container images
+>
+> Many bioinformatics tools are already available as containers. One
+> resource is the BioContainers project. Let's find the "subread" software:
+>
+> 1. Visit [https://biocontainers.pro/](https://biocontainers.pro/)
+> 2. Click on "Registry"
+> 3. Search for "subread"
+> 4. Click on the search result for "subread"
+> 5. Click on the tab "Packages and Containers"
+> 6. Choose a row with type "docker", then on the right side of the "Full
+> Tag" column for that row, click the "copy to clipboard" button.
+>
+> To declare that you want to run inside a container, add a section
+> called `hints` to your tool document. Under `hints` add a
+> subsection `DockerRequirement`. Under `DockerRequirement`, paste
+> the text your copied in the above step. Replace the text `docker
+> pull` to `dockerPull:` ensure it is indented twice so it is a field
+> of `DockerRequirement`.
+>
+> > ## Answer
+> > ```
+> > hints:
+> > DockerRequirement:
+> > dockerPull: quay.io/biocontainers/subread:1.5.0p3--0
+> > ```
+> > {: .language-yaml }
+> {: .solution}
+{: .challenge}
+
+# Running a tool on its own
When creating a tool wrapper, it is helpful to run it on its own to test it.
The input to a single tool is the same kind of input parameters file
that we used as input to a workflow in the previous lesson.
-featureCounts.yaml:
+`featureCounts.yaml`
```
counts_input_bam:
class: File
location: rnaseq/reference_data/chr1-hg19_genes.gtf
```
-
-The invocation is also the same:
-
-```
-cwl-runner featureCounts.cwl featureCounts.yaml
-```
-
-# 8. Adding it to the workflow
-
-Now that we have confirmed that it works, we can add it to our workflow.
-We add it to `steps`, connecting the output of samtools to
-`counts_input_bam` and the `gtf` taking the workflow input of the same
-name.
-
-```
-steps:
- ...
- featureCounts:
- requirements:
- ResourceRequirement:
- ramMin: 500
- run: featureCounts.cwl
- in:
- counts_input_bam: samtools/bam_sorted_indexed
- gtf: gtf
- out: [featurecounts]
-```
-
-We will add the result from featurecounts to the output:
-
-```
-outputs:
- ...
- featurecounts:
- type: File
- outputSource: featureCounts/featurecounts
-```
-
-You should now be able to re-run the workflow and it will run the
-"featureCounts" step and include "featurecounts" in the output.
+{: .language-yaml }
+
+> ## Running the tool
+>
+> Run the tool on its own to confirm it has correct behavior:
+>
+> ```
+> cwl-runner featureCounts.cwl featureCounts.yaml
+> ```
+> {: .language-bash }
+{: .challenge }
+
+# Adding it to the workflow
+
+Now that we have confirmed that the tool wrapper works, it is time to
+add it to our workflow.
+
+> ## Exercise
+>
+> 1. Add a new step called `featureCounts` that runs our tool
+> wrapper. The new step should take input from
+> `samtools/bam_sorted_indexed`, and should be allocated a
+> minimum of 500 MB of RAM
+> 2. Add a new output parameter for the workflow called
+> `featurecounts` The output source should come from the output
+> of the new `featureCounts` step.
+> 3. When you have an answer, run the updated workflow, which
+> should run the "featureCounts" step and produce "featurecounts"
+> output parameter.
+>
+> > ## Answer
+> > ```
+> > steps:
+> > ...
+> > featureCounts:
+> > requirements:
+> > ResourceRequirement:
+> > ramMin: 500
+> > run: featureCounts.cwl
+> > in:
+> > counts_input_bam: samtools/bam_sorted_indexed
+> > gtf: gtf
+> > out: [featurecounts]
+> >
+> > outputs:
+> > ...
+> > featurecounts:
+> > type: File
+> > outputSource: featureCounts/featurecounts
+> > ```
+> > {: .language-yaml }
+> {: .solution}
+{: .challenge}
+
+> ## Episode solution
+> * <a href="../assets/answers/ep4/main.cwl">main.cwl</a>
+> * <a href="../assets/answers/ep4/featureCounts.cwl">featureCounts.cwl</a>
+{: .solution}