Merge branch '18326-cuda-docs' refs #18326

author Peter Amstutz <peter.amstutz@curii.com>

Tue, 25 Jan 2022 16:26:33 +0000 (11:26 -0500)

committer Peter Amstutz <peter.amstutz@curii.com>

Tue, 25 Jan 2022 16:26:33 +0000 (11:26 -0500)
author Peter Amstutz <peter.amstutz@curii.com>
Tue, 25 Jan 2022 16:26:33 +0000 (11:26 -0500)
committer Peter Amstutz <peter.amstutz@curii.com>
Tue, 25 Jan 2022 16:26:33 +0000 (11:26 -0500)
diff --git a/doc/_includes/_install_cuda.liquid b/doc/_includes/_install_cuda.liquid

new file mode 100644 (file)

index 0000000..cb1519a
--- /dev/null
+++ b/doc/_includes/_install_cuda.liquid
@@ -0,0 +1,21 @@
+{% comment %}
+Copyright (C) The Arvados Authors. All rights reserved.
+
+SPDX-License-Identifier: CC-BY-SA-3.0
+{% endcomment %}
+
+h2(#cuda). Install NVIDA CUDA Toolkit (optional)
+
+If you want to use NVIDIA GPUs, "install the CUDA toolkit.":https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
+
+In addition, you also must install the NVIDIA Container Toolkit:
+
+<pre>
+DIST=$(. /etc/os-release; echo $ID$VERSION_ID)
+curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | \
+  sudo apt-key add -
+curl -s -L https://nvidia.github.io/libnvidia-container/$DIST/libnvidia-container.list | \
+  sudo tee /etc/apt/sources.list.d/libnvidia-container.list
+sudo apt-get update
+apt-get install libnvidia-container1 libnvidia-container-tools nvidia-container-toolkit
+</pre>
diff --git a/doc/install/crunch2-cloud/install-compute-node.html.textile.liquid b/doc/install/crunch2-cloud/install-compute-node.html.textile.liquid

index 131dde5996dcebf73d33276500c4b05141cee6cb..89771514e9fdf380c141ae3e474b78ecdb3ef832 100644 (file)
--- a/doc/install/crunch2-cloud/install-compute-node.html.textile.liquid
+++ b/doc/install/crunch2-cloud/install-compute-node.html.textile.liquid
@@ -125,10 +125,16 @@ Options:
        Path to the public key file that a-d-c will use to log into the compute node
    --mksquashfs-mem (default: 256M)
        Only relevant when using Singularity. This is the amount of memory mksquashfs is allowed to use.
-  --debug
-      Output debug information (default: false)
+  --nvidia-gpu-support (default: false)
+      Install all the necessary tooling for Nvidia GPU support
+  --debug (default: false)
+      Output debug information
  </code></pre></notextile>
  
+h2(#building). NVIDIA GPU support
+
+If you plan on using instance types with NVIDIA GPUs, add @--nvidia-gpu-support@ to the build command line.  Arvados uses the same compute image for both GPU and non-GPU instance types.  The GPU tooling is ignored when using the image with a non-GPU instance type.
+
  h2(#aws). Build an AWS image
  
  <notextile><pre><code>~$ <span class="userinput">./build.sh --json-file arvados-images-aws.json \
diff --git a/doc/install/crunch2-cloud/install-dispatch-cloud.html.textile.liquid b/doc/install/crunch2-cloud/install-dispatch-cloud.html.textile.liquid

index b4987f44373eb533e616c0b6f263cbf086f5562b..06a918dd37bcfab0df9a050dccbc1e4e0de68170 100644 (file)
--- a/doc/install/crunch2-cloud/install-dispatch-cloud.html.textile.liquid
+++ b/doc/install/crunch2-cloud/install-dispatch-cloud.html.textile.liquid
@@ -74,6 +74,27 @@ Add or update the following portions of your cluster configuration file, @config
  </code></pre>
  </notextile>
  
+h4. NVIDIA GPU support
+
+To specify instance types with NVIDIA GPUs, you must include an additional @CUDA@ section:
+
+<notextile>
+<pre><code>    InstanceTypes:
+      g4dn:
+        ProviderType: g4dn.xlarge
+        VCPUs: 4
+        RAM: 16GiB
+        IncludedScratch: 125GB
+        Price: 0.56
+        CUDA:
+          DriverVersion: "11.4"
+          HardwareCapability: "7.5"
+          DeviceCount: 1
+</code></pre>
+</notextile>
+
+The @DriverVersion@ is the version of the CUDA toolkit installed in your compute image (in X.Y format, do not include the patchlevel).  The @HardwareCapability@ is the CUDA compute capability of the GPUs available for this instance type.  The @DeviceCount@ is the number of GPU cores available for this instance type.
+
  h4. Minimal configuration example for Amazon EC2
  
  The <span class="userinput">ImageID</span> value is the compute node image that was built in "the previous section":install-compute-node.html#aws.
diff --git a/doc/install/crunch2-lsf/install-dispatch.html.textile.liquid b/doc/install/crunch2-lsf/install-dispatch.html.textile.liquid

index 7e44c8ec43c080fe26140003ab6bce9874b908b9..37adffd18d4e9bef5162614b015a3155df3333a5 100644 (file)
--- a/doc/install/crunch2-lsf/install-dispatch.html.textile.liquid
+++ b/doc/install/crunch2-lsf/install-dispatch.html.textile.liquid
@@ -64,17 +64,39 @@ Alternatively, you can arrange for the arvados-dispatch-lsf process to run as an
  
  h3(#SbatchArguments). Containers.LSF.BsubArgumentsList
  
-When arvados-dispatch-lsf invokes @bsub@, you can add arguments to the command by specifying @BsubArgumentsList@.  You can use this to send the jobs to specific cluster partitions or add resource requests.  Set @BsubArgumentsList@ to an array of strings.  For example:
+When arvados-dispatch-lsf invokes @bsub@, you can add arguments to the command by specifying @BsubArgumentsList@.  You can use this to send the jobs to specific cluster partitions or add resource requests.  Set @BsubArgumentsList@ to an array of strings.
+
+Template variables starting with % will be substituted as follows:
+
+%U uuid
+%C number of VCPUs
+%M memory in MB
+%T tmp in MB
+%G number of GPU devices (@runtime_constraints.cuda.device_count@)
+
+Use %% to express a literal %. The %%J in the default will be changed to %J, which is interpreted by @bsub@ itself.
+
+For example:
  
  <notextile>
  <pre>    Containers:
        LSF:
-        <code class="userinput">BsubArgumentsList: <b>["-C", "0", "-o", "/tmp/crunch-run.%J.out", "-e", "/tmp/crunch-run.%J.err"]</b></code>
+        <code class="userinput">BsubArgumentsList: <b>["-o", "/tmp/crunch-run.%%J.out", "-e", "/tmp/crunch-run.%%J.err", "-J", "%U", "-n", "%C", "-D", "%MMB", "-R", "rusage[mem=%MMB:tmp=%TMB] span[hosts=1]", "-R", "select[mem>=%MMB]", "-R", "select[tmp>=%TMB]", "-R", "select[ncpus>=%C]"]</b></code>
  </pre>
  </notextile>
  
  Note that the default value for @BsubArgumentsList@ uses the @-o@ and @-e@ arguments to write stdout/stderr data to files in @/tmp@ on the compute nodes, which is helpful for troubleshooting installation/configuration problems. Ensure you have something in place to delete old files from @/tmp@, or adjust these arguments accordingly.
  
+h3(#SbatchArguments). Containers.LSF.BsubCUDAArguments
+
+If the container requests access to GPUs (@runtime_constraints.cuda.device_count@ of the container request is greater than zero), the command line arguments in @BsubCUDAArguments@ will be added to the command line _after_ @BsubArgumentsList@.  This should consist of the additional @bsub@ flags your site requires to schedule the job on a node with GPU support.  Set @BsubCUDAArguments@ to an array of strings.  For example:
+
+<notextile>
+<pre>    Containers:
+      LSF:
+        <code class="userinput">BsubCUDAArguments: <b>["-gpu", "num=%G"]</b></code>
+</pre>
+</notextile>
  
  h3(#PollPeriod). Containers.PollInterval
  
diff --git a/doc/install/crunch2/install-compute-node-docker.html.textile.liquid b/doc/install/crunch2/install-compute-node-docker.html.textile.liquid

index 66bd85b7c5038073beaf95d342fde7c2060d90b2..6204d524f4e3c259f200e81315dd9d02e047dcf3 100644 (file)
--- a/doc/install/crunch2/install-compute-node-docker.html.textile.liquid
+++ b/doc/install/crunch2/install-compute-node-docker.html.textile.liquid
@@ -31,6 +31,8 @@ h2(#docker). Set up Docker
  
  See "Set up Docker":../install-docker.html
  
+{% include 'install_cuda' %}
+
  {% assign arvados_component = 'python-arvados-fuse crunch-run arvados-docker-cleaner' %}
  
  {% include 'install_compute_fuse' %}
diff --git a/doc/install/crunch2/install-compute-node-singularity.html.textile.liquid b/doc/install/crunch2/install-compute-node-singularity.html.textile.liquid

index 14f95e48fb3be044f5cd8d5bb1730d6a32fcfed0..e61b6cbe3783b192325b77c1454747abcc128a1e 100644 (file)
--- a/doc/install/crunch2/install-compute-node-singularity.html.textile.liquid
+++ b/doc/install/crunch2/install-compute-node-singularity.html.textile.liquid
@@ -32,6 +32,8 @@ This page describes how to configure a compute node so that it can be used to ru
  
  {% include 'install_packages' %}
  
+{% include 'install_cuda' %}
+
  h2(#singularity). Set up Singularity
  
  Follow the "Singularity installation instructions":https://sylabs.io/guides/3.7/user-guide/quick_start.html. Make sure @singularity@ and @mksquashfs@ are working:
diff --git a/doc/user/cwl/cwl-extensions.html.textile.liquid b/doc/user/cwl/cwl-extensions.html.textile.liquid

index 0580dca289f431a10ebaab322833368ddd3ef107..dd78e989fd52afe4ddd24940a00d76634f546a2d 100644 (file)
--- a/doc/user/cwl/cwl-extensions.html.textile.liquid
+++ b/doc/user/cwl/cwl-extensions.html.textile.liquid
@@ -58,7 +58,7 @@ hints:
        property1: value1
        property2: $(inputs.value2)
  
-  arv:CUDARequirement:
+  cwltool:CUDARequirement:
      cudaVersionMin: "11.0"
      cudaComputeCapabilityMin: "9.0"
      deviceCountMin: 1
@@ -153,7 +153,7 @@ table(table table-bordered table-condensed).
  |_. Field |_. Type |_. Description |
  |processProperties|key-value map, or list of objects with the fields {propertyName, propertyValue}|The properties that will be set on the container request.  May include expressions that reference `$(inputs)` of the current workflow or tool.|
  
-h2(#CUDARequirement). arv:CUDARequirement
+h2(#CUDARequirement). cwltool:CUDARequirement
  
  Request support for Nvidia CUDA GPU acceleration in the container.  Assumes that the CUDA runtime (SDK) is installed in the container, and the host will inject the CUDA driver libraries into the container (equal or later to the version requested).
  
diff --git a/doc/user/cwl/cwl-style.html.textile.liquid b/doc/user/cwl/cwl-style.html.textile.liquid

index bd07161ce3b203aca424b5287a48362d51d46787..853ed3b3e2be241d7e7c7dad9ae2c64312636449 100644 (file)
--- a/doc/user/cwl/cwl-style.html.textile.liquid
+++ b/doc/user/cwl/cwl-style.html.textile.liquid
@@ -11,9 +11,36 @@ SPDX-License-Identifier: CC-BY-SA-3.0
  
  h2(#performance). Performance
  
-To get the best perfomance from your workflows, be aware of the following Arvados features, behaviors, and best practices:
+To get the best perfomance from your workflows, be aware of the following Arvados features, behaviors, and best practices.
  
-If you have a sequence of short-running steps (less than 1-2 minutes each), use the Arvados extension "arv:RunInSingleContainer":cwl-extensions.html#RunInSingleContainer to avoid scheduling and data transfer overhead by running all the steps together at once.  To use this feature, @cwltool@ must be installed in the container image.
+Does your application support NVIDIA GPU acceleration?  Use "cwltool:CUDARequirement":cwl-extensions.html#CUDARequirement to request nodes with GPUs.
+
+If you have a sequence of short-running steps (less than 1-2 minutes each), use the Arvados extension "arv:RunInSingleContainer":cwl-extensions.html#RunInSingleContainer to avoid scheduling and data transfer overhead by running all the steps together in the same container on the same node.  To use this feature, @cwltool@ must be installed in the container image.  Example:
+
+{% codeblock as yaml %}
+class: Workflow
+cwlVersion: v1.0
+$namespaces:
+  arv: "http://arvados.org/cwl#"
+inputs:
+  file: File
+outputs: []
+requirements:
+  SubworkflowFeatureRequirement: {}
+steps:
+  subworkflow-with-short-steps:
+    in:
+      file: file
+    out: [out]
+    # This hint indicates that the subworkflow should be bundled and
+    # run in a single container, instead of the normal behavior, which
+    # is to run each step in a separate container.  This greatly
+    # reduces overhead if you have a series of short jobs, without
+    # requiring any changes the CWL definition of the sub workflow.
+    hints:
+      - class: arv:RunInSingleContainer
+    run: subworkflow-with-short-steps.cwl
+{% endcodeblock %}
  
  Avoid declaring @InlineJavascriptRequirement@ or @ShellCommandRequirement@ unless you specifically need them.  Don't include them "just in case" because they change the default behavior and may add extra overhead.
  
@@ -123,7 +150,7 @@ To write workflows that are easy to modify and portable across CWL runners (in t
  
  Workflows should always provide @DockerRequirement@ in the @hints@ or @requirements@ section.
  
-Build a reusable library of components.  Share tool wrappers and subworkflows between projects.  Make use of and contribute to "community maintained workflows and tools":https://github.com/common-workflow-language/workflows and tool registries such as "Dockstore":http://dockstore.org .
+Build a reusable library of components.  Share tool wrappers and subworkflows between projects.  Make use of and contribute to "community maintained workflows and tools":https://github.com/common-workflow-library and tool registries such as "Dockstore":http://dockstore.org .
  
  CommandLineTools wrapping custom scripts should represent the script as an input parameter with the script file as a default value.  Use @secondaryFiles@ for scripts that consist of multiple files.  For example:
author	Peter Amstutz <peter.amstutz@curii.com>
	Tue, 25 Jan 2022 16:26:33 +0000 (11:26 -0500)
committer	Peter Amstutz <peter.amstutz@curii.com>
	Tue, 25 Jan 2022 16:26:33 +0000 (11:26 -0500)
doc/_includes/_install_cuda.liquid	[new file with mode: 0644]	patch \| blob
doc/install/crunch2-cloud/install-compute-node.html.textile.liquid		patch \| blob \| history
doc/install/crunch2-cloud/install-dispatch-cloud.html.textile.liquid		patch \| blob \| history
doc/install/crunch2-lsf/install-dispatch.html.textile.liquid		patch \| blob \| history
doc/install/crunch2/install-compute-node-docker.html.textile.liquid		patch \| blob \| history
doc/install/crunch2/install-compute-node-singularity.html.textile.liquid		patch \| blob \| history
doc/user/cwl/cwl-extensions.html.textile.liquid		patch \| blob \| history
doc/user/cwl/cwl-style.html.textile.liquid		patch \| blob \| history