--- /dev/null
+{% comment %}
+Copyright (C) The Arvados Authors. All rights reserved.
+
+SPDX-License-Identifier: CC-BY-SA-3.0
+{% endcomment %}
+
+h2(#cuda). Install NVIDA CUDA Toolkit (optional)
+
+If you want to use NVIDIA GPUs, "install the CUDA toolkit.":https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
+
+In addition, you also must install the NVIDIA Container Toolkit:
+
+<pre>
+DIST=$(. /etc/os-release; echo $ID$VERSION_ID)
+curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | \
+ sudo apt-key add -
+curl -s -L https://nvidia.github.io/libnvidia-container/$DIST/libnvidia-container.list | \
+ sudo tee /etc/apt/sources.list.d/libnvidia-container.list
+sudo apt-get update
+apt-get install libnvidia-container1 libnvidia-container-tools nvidia-container-toolkit
+</pre>
Path to the public key file that a-d-c will use to log into the compute node
--mksquashfs-mem (default: 256M)
Only relevant when using Singularity. This is the amount of memory mksquashfs is allowed to use.
- --debug
- Output debug information (default: false)
+ --nvidia-gpu-support (default: false)
+ Install all the necessary tooling for Nvidia GPU support
+ --debug (default: false)
+ Output debug information
</code></pre></notextile>
+h2(#building). NVIDIA GPU support
+
+If you plan on using instance types with NVIDIA GPUs, add @--nvidia-gpu-support@ to the build command line. Arvados uses the same compute image for both GPU and non-GPU instance types. The GPU tooling is ignored when using the image with a non-GPU instance type.
+
h2(#aws). Build an AWS image
<notextile><pre><code>~$ <span class="userinput">./build.sh --json-file arvados-images-aws.json \
</code></pre>
</notextile>
+h4. NVIDIA GPU support
+
+To specify instance types with NVIDIA GPUs, you must include an additional @CUDA@ section:
+
+<notextile>
+<pre><code> InstanceTypes:
+ g4dn:
+ ProviderType: g4dn.xlarge
+ VCPUs: 4
+ RAM: 16GiB
+ IncludedScratch: 125GB
+ Price: 0.56
+ CUDA:
+ DriverVersion: "11.4"
+ HardwareCapability: "7.5"
+ DeviceCount: 1
+</code></pre>
+</notextile>
+
+The @DriverVersion@ is the version of the CUDA toolkit installed in your compute image (in X.Y format, do not include the patchlevel). The @HardwareCapability@ is the CUDA compute capability of the GPUs available for this instance type. The @DeviceCount@ is the number of GPU cores available for this instance type.
+
h4. Minimal configuration example for Amazon EC2
The <span class="userinput">ImageID</span> value is the compute node image that was built in "the previous section":install-compute-node.html#aws.
h3(#SbatchArguments). Containers.LSF.BsubArgumentsList
-When arvados-dispatch-lsf invokes @bsub@, you can add arguments to the command by specifying @BsubArgumentsList@. You can use this to send the jobs to specific cluster partitions or add resource requests. Set @BsubArgumentsList@ to an array of strings. For example:
+When arvados-dispatch-lsf invokes @bsub@, you can add arguments to the command by specifying @BsubArgumentsList@. You can use this to send the jobs to specific cluster partitions or add resource requests. Set @BsubArgumentsList@ to an array of strings.
+
+Template variables starting with % will be substituted as follows:
+
+%U uuid
+%C number of VCPUs
+%M memory in MB
+%T tmp in MB
+%G number of GPU devices (@runtime_constraints.cuda.device_count@)
+
+Use %% to express a literal %. The %%J in the default will be changed to %J, which is interpreted by @bsub@ itself.
+
+For example:
<notextile>
<pre> Containers:
LSF:
- <code class="userinput">BsubArgumentsList: <b>["-C", "0", "-o", "/tmp/crunch-run.%J.out", "-e", "/tmp/crunch-run.%J.err"]</b></code>
+ <code class="userinput">BsubArgumentsList: <b>["-o", "/tmp/crunch-run.%%J.out", "-e", "/tmp/crunch-run.%%J.err", "-J", "%U", "-n", "%C", "-D", "%MMB", "-R", "rusage[mem=%MMB:tmp=%TMB] span[hosts=1]", "-R", "select[mem>=%MMB]", "-R", "select[tmp>=%TMB]", "-R", "select[ncpus>=%C]"]</b></code>
</pre>
</notextile>
Note that the default value for @BsubArgumentsList@ uses the @-o@ and @-e@ arguments to write stdout/stderr data to files in @/tmp@ on the compute nodes, which is helpful for troubleshooting installation/configuration problems. Ensure you have something in place to delete old files from @/tmp@, or adjust these arguments accordingly.
+h3(#SbatchArguments). Containers.LSF.BsubCUDAArguments
+
+If the container requests access to GPUs (@runtime_constraints.cuda.device_count@ of the container request is greater than zero), the command line arguments in @BsubCUDAArguments@ will be added to the command line _after_ @BsubArgumentsList@. This should consist of the additional @bsub@ flags your site requires to schedule the job on a node with GPU support. Set @BsubCUDAArguments@ to an array of strings. For example:
+
+<notextile>
+<pre> Containers:
+ LSF:
+ <code class="userinput">BsubCUDAArguments: <b>["-gpu", "num=%G"]</b></code>
+</pre>
+</notextile>
h3(#PollPeriod). Containers.PollInterval
See "Set up Docker":../install-docker.html
+{% include 'install_cuda' %}
+
{% assign arvados_component = 'python-arvados-fuse crunch-run arvados-docker-cleaner' %}
{% include 'install_compute_fuse' %}
{% include 'install_packages' %}
+{% include 'install_cuda' %}
+
h2(#singularity). Set up Singularity
Follow the "Singularity installation instructions":https://sylabs.io/guides/3.7/user-guide/quick_start.html. Make sure @singularity@ and @mksquashfs@ are working:
property1: value1
property2: $(inputs.value2)
- arv:CUDARequirement:
+ cwltool:CUDARequirement:
cudaVersionMin: "11.0"
cudaComputeCapabilityMin: "9.0"
deviceCountMin: 1
|_. Field |_. Type |_. Description |
|processProperties|key-value map, or list of objects with the fields {propertyName, propertyValue}|The properties that will be set on the container request. May include expressions that reference `$(inputs)` of the current workflow or tool.|
-h2(#CUDARequirement). arv:CUDARequirement
+h2(#CUDARequirement). cwltool:CUDARequirement
Request support for Nvidia CUDA GPU acceleration in the container. Assumes that the CUDA runtime (SDK) is installed in the container, and the host will inject the CUDA driver libraries into the container (equal or later to the version requested).
h2(#performance). Performance
-To get the best perfomance from your workflows, be aware of the following Arvados features, behaviors, and best practices:
+To get the best perfomance from your workflows, be aware of the following Arvados features, behaviors, and best practices.
-If you have a sequence of short-running steps (less than 1-2 minutes each), use the Arvados extension "arv:RunInSingleContainer":cwl-extensions.html#RunInSingleContainer to avoid scheduling and data transfer overhead by running all the steps together at once. To use this feature, @cwltool@ must be installed in the container image.
+Does your application support NVIDIA GPU acceleration? Use "cwltool:CUDARequirement":cwl-extensions.html#CUDARequirement to request nodes with GPUs.
+
+If you have a sequence of short-running steps (less than 1-2 minutes each), use the Arvados extension "arv:RunInSingleContainer":cwl-extensions.html#RunInSingleContainer to avoid scheduling and data transfer overhead by running all the steps together in the same container on the same node. To use this feature, @cwltool@ must be installed in the container image. Example:
+
+{% codeblock as yaml %}
+class: Workflow
+cwlVersion: v1.0
+$namespaces:
+ arv: "http://arvados.org/cwl#"
+inputs:
+ file: File
+outputs: []
+requirements:
+ SubworkflowFeatureRequirement: {}
+steps:
+ subworkflow-with-short-steps:
+ in:
+ file: file
+ out: [out]
+ # This hint indicates that the subworkflow should be bundled and
+ # run in a single container, instead of the normal behavior, which
+ # is to run each step in a separate container. This greatly
+ # reduces overhead if you have a series of short jobs, without
+ # requiring any changes the CWL definition of the sub workflow.
+ hints:
+ - class: arv:RunInSingleContainer
+ run: subworkflow-with-short-steps.cwl
+{% endcodeblock %}
Avoid declaring @InlineJavascriptRequirement@ or @ShellCommandRequirement@ unless you specifically need them. Don't include them "just in case" because they change the default behavior and may add extra overhead.
Workflows should always provide @DockerRequirement@ in the @hints@ or @requirements@ section.
-Build a reusable library of components. Share tool wrappers and subworkflows between projects. Make use of and contribute to "community maintained workflows and tools":https://github.com/common-workflow-language/workflows and tool registries such as "Dockstore":http://dockstore.org .
+Build a reusable library of components. Share tool wrappers and subworkflows between projects. Make use of and contribute to "community maintained workflows and tools":https://github.com/common-workflow-library and tool registries such as "Dockstore":http://dockstore.org .
CommandLineTools wrapping custom scripts should represent the script as an input parameter with the script file as a default value. Use @secondaryFiles@ for scripts that consist of multiple files. For example: