2 title: "Writing a tool wrapper"
6 - "Key question (FIXME)"
8 - "First learning objective. (FIXME)"
10 - "First key point. Brief Answer to questions. (FIXME)"
13 It is time to add the last step in the analysis.
15 This will use the "featureCounts" tool from the "subread" package.
19 Create a new file "featureCounts.cwl"
21 Start with this header
25 class: CommandLineTool
28 # 2. Command line tool inputs
30 A CommandLineTool describes a single invocation of a command line program.
32 It consumes some input parameters, runs a program, and produce output
35 Here is the original shell command:
38 featureCounts -T $cores -s 2 -a $gtf -o $counts $counts_input_bam
41 The variables used in the bash script are `$cores`, `$gtf`, `$counts` and `$counts_input_bam`.
45 This gives us two file inputs, `gtf` and `counts_input_bam` which we can declare in our `inputs` section:
50 counts_input_bam: File
53 # 3. Specifying the program to run
55 Give the name of the program to run in `baseCommand`.
58 baseCommand: featureCounts
61 # 4. Command arguments
63 The easiest way to describe the command line is with an `arguments`
64 section. This takes a comma-separated list of command line arguments.
66 Input variables are included on the command line as
67 `$(inputs.name_of_parameter)`. When the tool is executed, these input
68 parameter values are substituted for these variable.
70 Special variables are also available. The runtime environment
71 describes the resources allocated to running the program. Here we use
72 `$(runtime.cores)` to decide how many threads to request.
75 arguments: [-T, $(runtime.cores),
77 -o, featurecounts.tsv,
78 $(inputs.counts_input_bam)]
83 In CWL, you must explicitly identify the outputs of a program. This
84 associates output parameters with specific files, and enables the
85 workflow runner to know which files must be saved and which files can
88 In the previous section, we told the featureCounts program the name of
89 our output files should be `featurecounts.tsv`.
91 We can declare an output parameter called `featurecounts` that will
92 have that output file as its value.
94 The `outputBinding` section describes how to determine the value of
95 the parameter. The `glob` field tells it to search for a file in the
96 output directory called `featurecounts.tsv`
103 glob: featurecounts.tsv
106 # 6. Running in a container
108 In order to run the tool, it needs to be installed.
109 Using software containers, a tool can be pre-installed into a
110 compatible runtime environment, and that runtime environment (called a
111 container image) can be downloaded and run on demand.
113 Many bioinformatics tools are already available as containers. One
114 resource is the BioContainers project. Let's find the "subread" software:
116 1. Visit https://biocontainers.pro/
117 2. Click on "Registry"
118 3. Search for "subread"
119 4. Click on the search result for "subread"
120 5. Click on the tab "Packages and Containers"
121 6. Choose a row with type "docker", then on the right side of the "Full
122 Tag" column for that row, click the "copy to clipboard" button.
124 To declare that you want to run inside a container, create a section
125 called `hints` with a subsection `DockerRequirement`. Under
126 `DockerRequirement`, paste the text your copied in the above step.
127 Replace the text `docker pull` to `dockerPull:` and indent it so it is
128 in the `DockerRequirement` section.
133 dockerPull: quay.io/biocontainers/subread:1.5.0p3--0
136 # 7. Running a tool on its own
138 When creating a tool wrapper, it is helpful to run it on its own to test it.
140 The input to a single tool is the same kind of input parameters file
141 that we used as input to a workflow in the previous lesson.
148 location: Aligned.sortedByCoord.out.bam
151 location: rnaseq/reference_data/chr1-hg19_genes.gtf
154 The invocation is also the same:
157 cwl-runner featureCounts.cwl featureCounts.yaml
160 # 8. Adding it to the workflow
162 Now that we have confirmed that it works, we can add it to our workflow.
163 We add it to `steps`, connecting the output of samtools to
164 `counts_input_bam` and the `gtf` taking the workflow input of the same
174 run: featureCounts.cwl
176 counts_input_bam: samtools/bam_sorted_indexed
181 We will add the result from featurecounts to the output:
188 outputSource: featureCounts/featurecounts
191 You should now be able to re-run the workflow and it will run the
192 "featureCounts" step and include "featurecounts" in the output.