h3. Configure CloudVMs
-Add or update the following portions of your cluster configuration file, @config.yml@. Refer to "config.defaults.yml":{{site.baseurl}}/admin/config.html for information about additional configuration options. The @DispatchPrivateKey@ should be the *private* key generated in "the previous section":install-compute-node.html#sshkeypair.
+Add or update the following portions of your cluster configuration file, @config.yml@. Refer to "config.defaults.yml":{{site.baseurl}}/admin/config.html for information about additional configuration options. The @DispatchPrivateKey@ should be the *private* key generated in "Create a SSH keypair":install-compute-node.html#sshkeypair .
<notextile>
<pre><code> Services:
</code></pre>
</notextile>
-h4(#GPUsupport). NVIDIA GPU support
+h3(#GPUsupport). NVIDIA GPU support
-To specify instance types with NVIDIA GPUs, you must include an additional @CUDA@ section:
+To specify instance types with NVIDIA GPUs, "the compute image must be built with CUDA support":install-compute-node.html#nvidia , and you must include an additional @CUDA@ section:
<notextile>
<pre><code> InstanceTypes:
</code></pre>
</notextile>
-The @DriverVersion@ is the version of the CUDA toolkit installed in your compute image (in X.Y format, do not include the patchlevel). The @HardwareCapability@ is the CUDA compute capability of the GPUs available for this instance type. The @DeviceCount@ is the number of GPU cores available for this instance type.
+The @DriverVersion@ is the version of the CUDA toolkit installed in your compute image (in X.Y format, do not include the patchlevel). The @HardwareCapability@ is the "CUDA compute capability of the GPUs available for this instance type":https://developer.nvidia.com/cuda-gpus. The @DeviceCount@ is the number of GPU cores available for this instance type.
-h4. Minimal configuration example for Amazon EC2
+h3(#aws-ebs-autoscaler). EBS Autoscale configuration
+
+See "Autoscaling compute node scratch space":install-compute-node.html#aws-ebs-autoscaler for details about compute image configuration.
+
+The @Containers.InstanceTypes@ list should be modified so that all @AddedScratch@ lines are removed, and the @IncludedScratch@ value should be set to 5 TB. This way, the scratch space requirements will be met by all the defined instance type. For example:
+
+<notextile><pre><code> InstanceTypes:
+ c5large:
+ ProviderType: c5.large
+ VCPUs: 2
+ RAM: 4GiB
+ IncludedScratch: 5TB
+ Price: 0.085
+ m5large:
+ ProviderType: m5.large
+ VCPUs: 2
+ RAM: 8GiB
+ IncludedScratch: 5TB
+ Price: 0.096
+...
+</code></pre></notextile>
+
+You will also need to create an IAM role in AWS with these permissions:
+
+<notextile><pre><code>{
+ "Statement": [
+ {
+ "Effect": "Allow",
+ "Action": [
+ "ec2:AttachVolume",
+ "ec2:DescribeVolumeStatus",
+ "ec2:DescribeVolumes",
+ "ec2:DescribeTags",
+ "ec2:ModifyInstanceAttribute",
+ "ec2:DescribeVolumeAttribute",
+ "ec2:CreateVolume",
+ "ec2:DeleteVolume",
+ "ec2:CreateTags"
+ ],
+ "Resource": "*"
+ }
+ ]
+}
+</code></pre></notextile>
+
+Then set @Containers.CloudVMs.DriverParameters.IAMInstanceProfile@ to the name of the IAM role. This will make @arvados-dispatch-cloud@ pass an IAM instance profile to the compute nodes when they start up, giving them sufficient permissions to attach and grow EBS volumes.
+
+h3. AWS Credentials for Local Keepstore on Compute node
+
+When @Containers.LocalKeepBlobBuffersPerVCPU@ is non-zero, the compute node will spin up a local Keepstore service for direct storage access. If Keep is backed by S3, the compute node will need to be able to access the S3 bucket.
+
+If the AWS credentials for S3 access are configured in @config.yml@ (i.e. @Volumes.DriverParameters.AccessKeyID@ and @Volumes.DriverParameters.SecretAccessKey@), these credentials will be made available to the local Keepstore on the compute node to access S3 directly and no further configuration is necessary.
+
+Alternatively, if an IAM role is configured in @config.yml@ (i.e. @Volumes.DriverParameters.IAMRole@), the name of an instance profile that corresponds to this role ("often identical to the name of the IAM role":https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#ec2-instance-profile) must be configured in the @CloudVMs.DriverParameters.IAMInstanceProfile@ parameter.
+
+*If you are also using EBS Autoscale feature, the role in IAMInstanceProfile must have both ec2 and s3 permissions.*
+
+Finally, if @config.yml@ does not have @Volumes.DriverParameters.AccessKeyID@, @Volumes.DriverParameters.SecretAccessKey@ or @Volumes.DriverParameters.IAMRole@ defined, Keepstore uses the IAM role attached to the node, whatever it may be called. The @CloudVMs.DriverParameters.IAMInstanceProfile@ parameter must then still be configured with the name of a profile whose IAM role has permission to access the S3 bucket(s). That way, @arvados-dispatch-cloud@ can attach the IAM role to the compute node as it is created.
+
+h3. Minimal configuration example for Amazon EC2
The <span class="userinput">ImageID</span> value is the compute node image that was built in "the previous section":install-compute-node.html#aws.
</code></pre>
</notextile>
+h3(#IAM). Example IAM policy for cloud dispatcher
+
Example policy for the IAM role used by the cloud dispatcher:
<notextile>
<pre>
{
- "Version": "2012-10-17",
"Id": "arvados-dispatch-cloud policy",
"Statement": [
{
"Effect": "Allow",
"Action": [
- "iam:PassRole",
- "ec2:DescribeKeyPairs",
- "ec2:ImportKeyPair",
- "ec2:RunInstances",
- "ec2:DescribeInstances",
- "ec2:CreateTags",
- "ec2:TerminateInstances"
+ "ec2:CreateTags",
+ "ec2:Describe*",
+ "ec2:CreateImage",
+ "ec2:CreateKeyPair",
+ "ec2:ImportKeyPair",
+ "ec2:DeleteKeyPair",
+ "ec2:RunInstances",
+ "ec2:StopInstances",
+ "ec2:TerminateInstances",
+ "ec2:ModifyInstanceAttribute",
+ "ec2:CreateSecurityGroup",
+ "ec2:DeleteSecurityGroup",
+ "iam:PassRole"
],
"Resource": "*"
}
</pre>
</notextile>
-h4. Minimal configuration example for Azure
+h3. Minimal configuration example for Azure
Using managed disks:
On the dispatch node, start monitoring the arvados-dispatch-cloud logs:
<notextile>
-<pre><code>~$ <span class="userinput">sudo journalctl -o cat -fu arvados-dispatch-cloud.service</span>
+<pre><code># <span class="userinput">journalctl -o cat -fu arvados-dispatch-cloud.service</span>
</code></pre>
</notextile>
-"Make sure to install the arvados/jobs image.":../install-jobs-image.html
-
-Submit a simple container request:
+In another terminal window, use the diagnostics tool to run a simple container.
<notextile>
-<pre><code>shell:~$ <span class="userinput">arv container_request create --container-request '{
- "name": "test",
- "state": "Committed",
- "priority": 1,
- "container_image": "arvados/jobs:latest",
- "command": ["echo", "Hello, Crunch!"],
- "output_path": "/out",
- "mounts": {
- "/out": {
- "kind": "tmp",
- "capacity": 1000
- }
- },
- "runtime_constraints": {
- "vcpus": 1,
- "ram": 1048576
- }
-}'</span>
+<pre><code># <span class="userinput">arvados-client sudo diagnostics</span>
+INFO 5: running health check (same as `arvados-server check`)
+INFO 10: getting discovery document from https://zzzzz.arvadosapi.com/discovery/v1/apis/arvados/v1/rest
+...
+INFO 160: running a container
+INFO ... container request submitted, waiting up to 10m for container to run
</code></pre>
</notextile>
-This command should return a record with a @container_uuid@ field. Once @arvados-dispatch-cloud@ polls the API server for new containers to run, you should see it dispatch that same container.
+After performing a number of other quick tests, this will submit a new container request and wait for it to finish.
-The @arvados-dispatch-cloud@ API provides a list of queued and running jobs and cloud instances. Use your @ManagementToken@ to test the dispatcher's endpoint. For example, when one container is running:
+While the diagnostics tool is waiting, the @arvados-dispatch-cloud@ logs will show details about creating a cloud instance, waiting for it to be ready, and scheduling the new container on it.
+
+You can also use the "arvados-dispatch-cloud API":{{site.baseurl}}/api/dispatch.html to get a list of queued and running jobs and cloud instances. Use your @ManagementToken@ to test the dispatcher's endpoint. For example, when one container is running:
<notextile>
<pre><code>~$ <span class="userinput">curl -sH "Authorization: Bearer $token" http://localhost:9006/arvados/v1/dispatch/containers</span>
A similar request can be made to the @http://localhost:9006/arvados/v1/dispatch/instances@ endpoint.
-When the container finishes, the dispatcher will log it.
-
After the container finishes, you can get the container record by UUID *from a shell server* to see its results:
<notextile>