Merge branch '14484-collection-record-update'
authorEric Biagiotti <ebiagiotti@veritasgenetics.com>
Thu, 4 Apr 2019 14:54:49 +0000 (10:54 -0400)
committerEric Biagiotti <ebiagiotti@veritasgenetics.com>
Thu, 4 Apr 2019 14:54:49 +0000 (10:54 -0400)
refs #14484

Arvados-DCO-1.1-Signed-off-by: Eric Biagiotti <ebiagiotti@veritasgenetics.com>

doc/_config.yml
doc/install/install-dispatch-cloud.html.textile.liquid [new file with mode: 0644]
lib/dispatchcloud/container/queue.go
lib/dispatchcloud/container/queue_test.go
sdk/cwl/setup.py

index 1e17d047062efd8fbf324edcb57979ef83b740df..a5b53442ca1848118a8065342b78ebe4460ee31c 100644 (file)
@@ -212,6 +212,8 @@ navbar:
       - install/crunch2-slurm/install-test.html.textile.liquid
       - install/install-nodemanager.html.textile.liquid
       - install/install-compute-ping.html.textile.liquid
+    - Containers API support on cloud (experimental):
+      - install/install-dispatch-cloud.html.textile.liquid
     - Jobs API support (deprecated):
       - install/install-crunch-dispatch.html.textile.liquid
       - install/install-compute-node.html.textile.liquid
diff --git a/doc/install/install-dispatch-cloud.html.textile.liquid b/doc/install/install-dispatch-cloud.html.textile.liquid
new file mode 100644 (file)
index 0000000..42c814b
--- /dev/null
@@ -0,0 +1,200 @@
+---
+layout: default
+navsection: installguide
+title: Install the cloud dispatcher
+
+...
+{% comment %}
+Copyright (C) The Arvados Authors. All rights reserved.
+
+SPDX-License-Identifier: CC-BY-SA-3.0
+{% endcomment %}
+
+The cloud dispatch service is an *experimental* service for running containers on cloud VMs. It eliminates the need for SLURM, Node Manager, and SLURM dispatcher. It works with Microsoft Azure and Amazon EC2; future versions will also support Google Compute Engine.
+
+The cloud dispatch service can run on any node that can connect to the Arvados API service, the cloud provider's API, and the SSH service on cloud VMs.  It is not resource-intensive, so you can run it on the API server node.
+
+*Only one dispatch process should be running at a time.* If you are migrating a system that currently runs @crunch-dispatch-slurm@, it is safest to remove the @crunch-dispatch-slurm@ service entirely before installing @arvados-dispatch-cloud@.
+
+<notextile>
+<pre><code>~$ <span class="userinput">sudo systemctl --now disable crunch-dispatch-slurm</span>
+~$ <span class="userinput">sudo apt-get remove crunch-dispatch-slurm</span>
+</code></pre>
+</notextile>
+
+h2. Create a dispatcher token
+
+If you haven't already done so, create an Arvados superuser token to use as SystemRootToken in your cluster config file.
+
+{% include 'create_superuser_token' %}
+
+h2. Create a private key
+
+Generate an SSH private key with no passphrase. Save it in the cluster configuration file (see @PrivateKey@ in the example below).
+
+<notextile>
+<pre><code>~$ <span class="userinput">ssh-keygen -N '' -f ~/.ssh/id_dispatcher</span>
+Generating public/private rsa key pair.
+Your identification has been saved in /home/user/.ssh/id_dispatcher.
+Your public key has been saved in /home/user/.ssh/id_dispatcher.pub.
+The key fingerprint is:
+[...]
+~$ <span class="userinput">cat ~/.ssh/id_dispatcher</span>
+-----BEGIN RSA PRIVATE KEY-----
+MIIEpQIBAAKCAQEAqXoCzcOBkFQ7w4dvXf9B++1ctgZRqEbgRYL3SstuMV4oawks
+ttUuxJycDdsPmeYcHsKo8vsEZpN6iYsX6ZZzhkO5nEayUTU8sBjmg1ZCTo4QqKXr
+...
+oFyAjVoexx0RBcH6BveTfQtJKbktP1qBO4mXo2dP0cacuZEtlAqW9Eb06Pvaw/D9
+foktmqOY8MyctzFgXBpGTxPliGjqo8OkrOyQP2g+FL7v+Km31Xs61P8=
+-----END RSA PRIVATE KEY-----
+</code></pre>
+</notextile>
+
+You can delete the key files after you have copied the private key to your configuration file.
+
+<notextile>
+<pre><code>~$ <span class="userinput">rm ~/.ssh/id_dispatcher ~/.ssh/id_dispatcher.pub</span>
+</code></pre>
+</notextile>
+
+h2. Configure the dispatcher
+
+Add or update the following portions of your cluster configuration file, @/etc/arvados/config.yml@. Refer to "config.defaults.yml":https://dev.arvados.org/projects/arvados/repository/revisions/13996-new-api-config/entry/lib/config/config.defaults.yml for information about additional configuration options.
+
+<notextile>
+<pre><code>Clusters:
+  <span class="userinput">uuid_prefix</span>:
+    ManagementToken: xyzzy
+    SystemRootToken: <span class="userinput">zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz</span>
+    NodeProfiles:
+      # The key "apiserver" corresponds to ARVADOS_NODE_PROFILE in environment file (see below).
+      apiserver:
+        arvados-dispatch-cloud:
+          Listen: ":9006"
+    Services:
+      Controller:
+        ExternalURL: "https://<span class="userinput">uuid_prefix.arvadosapi.com</span>"
+    CloudVMs:
+      # BootProbeCommand is a shell command that succeeds when an instance is ready for service
+      BootProbeCommand: "sudo systemctl status docker"
+
+      <b># --- driver-specific configuration goes here --- see Amazon and Azure examples below ---</b>
+
+    Dispatch:
+      PrivateKey: |
+        -----BEGIN RSA PRIVATE KEY-----
+        MIIEpQIBAAKCAQEAqXoCzcOBkFQ7w4dvXf9B++1ctgZRqEbgRYL3SstuMV4oawks
+        ttUuxJycDdsPmeYcHsKo8vsEZpN6iYsX6ZZzhkO5nEayUTU8sBjmg1ZCTo4QqKXr
+        FJ+amZ7oYMDof6QEdwl6KNDfIddL+NfBCLQTVInOAaNss7GRrxLTuTV7HcRaIUUI
+        jYg0Ibg8ZZTzQxCvFXXnjseTgmOcTv7CuuGdt91OVdoq8czG/w8TwOhymEb7mQlt
+        lXuucwQvYgfoUgcnTgpJr7j+hafp75g2wlPozp8gJ6WQ2yBWcfqL2aw7m7Ll88Nd
+        [...]
+        oFyAjVoexx0RBcH6BveTfQtJKbktP1qBO4mXo2dP0cacuZEtlAqW9Eb06Pvaw/D9
+        foktmqOY8MyctzFgXBpGTxPliGjqo8OkrOyQP2g+FL7v+Km31Xs61P8=
+        -----END RSA PRIVATE KEY-----
+    InstanceTypes:
+      x1md:
+        ProviderType: x1.medium
+        VCPUs: 8
+        RAM: 64GiB
+        IncludedScratch: 64GB
+        Price: 0.62
+      x1lg:
+        ProviderType: x1.large
+        VCPUs: 16
+        RAM: 128GiB
+        IncludedScratch: 128GB
+        Price: 1.23
+</code></pre>
+</notextile>
+
+Minimal configuration example for Amazon EC2:
+
+<notextile>
+<pre><code>Clusters:
+  <span class="userinput">uuid_prefix</span>:
+    CloudVMs:
+      ImageID: ami-01234567890abcdef
+      Driver: ec2
+      DriverParameters:
+        AccessKeyID: EALMF21BJC7MKNF9FVVR
+        SecretAccessKey: yKJAPmoCQOMtYWzEUQ1tKTyrocTcbH60CRvGP3pM
+        SecurityGroupIDs:
+        - sg-0123abcd
+        SubnetID: subnet-0123abcd
+        Region: us-east-1
+        EBSVolumeType: gp2
+        AdminUsername: debian
+</code></pre>
+</notextile>
+
+Minimal configuration example for Azure:
+
+<notextile>
+<pre><code>Clusters:
+  <span class="userinput">uuid_prefix</span>:
+    CloudVMs:
+      ImageID: "https://zzzzzzzz.blob.core.windows.net/system/Microsoft.Compute/Images/images/zzzzz-compute-osDisk.55555555-5555-5555-5555-555555555555.vhd"
+      Driver: azure
+      DriverParameters:
+        SubscriptionID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
+        ClientID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
+        ClientSecret: 2WyXt0XFbEtutnf2hp528t6Wk9S5bOHWkRaaWwavKQo=
+        TenantID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
+        CloudEnvironment: AzurePublicCloud
+        ResourceGroup: zzzzz
+        Location: centralus
+        Network: zzzzz
+        Subnet: zzzzz-subnet-private
+        StorageAccount: example
+        BlobContainer: vhds
+        DeleteDanglingResourcesAfter: 20s
+        AdminUsername: arvados
+</code></pre>
+</notextile>
+
+Create the host configuration file @/etc/arvados/environment@.
+
+<notextile>
+<pre><code>ARVADOS_NODE_PROFILE=apiserver
+</code></pre>
+</notextile>
+
+h2. Install the dispatcher
+
+First, "add the appropriate package repository for your distribution":{{ site.baseurl }}/install/install-manual-prerequisites.html#repos.
+
+On Red Hat-based systems:
+
+<notextile>
+<pre><code>~$ <span class="userinput">sudo yum install arvados-dispatch-cloud</span>
+~$ <span class="userinput">sudo systemctl enable arvados-dispatch-cloud</span>
+</code></pre>
+</notextile>
+
+On Debian-based systems:
+
+<notextile>
+<pre><code>~$ <span class="userinput">sudo apt-get install arvados-dispatch-cloud</span>
+</code></pre>
+</notextile>
+
+{% include 'notebox_begin' %}
+
+The arvados-dispatch-cloud package includes configuration files for systemd. If you're using a different init system, configure a service to start and stop an @arvados-dispatch-cloud@ process as desired.
+
+{% include 'notebox_end' %}
+
+h2. Verify the dispatcher is running
+
+Use your ManagementToken to test the dispatcher's metrics endpoint.
+
+<notextile>
+<pre><code>~$ <span class="userinput">token="xyzzy"</span>
+~$ <span class="userinput">curl -H "Authorization: Bearer $token" http://localhost:9006/metrics</span>
+# HELP arvados_dispatchcloud_containers_running Number of containers reported running by cloud VMs.
+# TYPE arvados_dispatchcloud_containers_running gauge
+arvados_dispatchcloud_containers_running 0
+[...]
+</code></pre>
+</notextile>
index af17aaf3927ce9f3b8b94a03ca289201c11640d2..50e73189efbc854433f8713e0a7762efafc0fe70 100644 (file)
@@ -314,15 +314,14 @@ func (cq *Queue) setRuntimeError(uuid, errorString string) error {
 
 // Cancel cancels the given container.
 func (cq *Queue) Cancel(uuid string) error {
-       err := cq.client.RequestAndDecode(nil, "PUT", "arvados/v1/containers/"+uuid, nil, map[string]map[string]interface{}{
+       var resp arvados.Container
+       err := cq.client.RequestAndDecode(&resp, "PUT", "arvados/v1/containers/"+uuid, nil, map[string]map[string]interface{}{
                "container": {"state": arvados.ContainerStateCancelled},
        })
        if err != nil {
                return err
        }
-       cq.mtx.Lock()
-       defer cq.mtx.Unlock()
-       cq.notify()
+       cq.updateWithResp(uuid, resp)
        return nil
 }
 
@@ -332,7 +331,13 @@ func (cq *Queue) apiUpdate(uuid, action string) error {
        if err != nil {
                return err
        }
+       cq.updateWithResp(uuid, resp)
+       return nil
+}
 
+// Update the local queue with the response received from a
+// state-changing API request (lock/unlock/cancel).
+func (cq *Queue) updateWithResp(uuid string, resp arvados.Container) {
        cq.mtx.Lock()
        defer cq.mtx.Unlock()
        if cq.dontupdate != nil {
@@ -345,7 +350,6 @@ func (cq *Queue) apiUpdate(uuid, action string) error {
                cq.current[uuid] = ent
        }
        cq.notify()
-       return nil
 }
 
 func (cq *Queue) poll() (map[string]*arvados.Container, error) {
index 91d65359e884a91955f47523a7d11836a52767df..3c63fe51e6e89a116a40ea5c72917a5d4528ab41 100644 (file)
@@ -74,6 +74,7 @@ func (suite *IntegrationSuite) TestGetLockUnlockCancel(c *check.C) {
                        defer wg.Done()
                        err := cq.Unlock(uuid)
                        c.Check(err, check.NotNil)
+
                        err = cq.Lock(uuid)
                        c.Check(err, check.IsNil)
                        ctr, ok := cq.Get(uuid)
@@ -81,6 +82,7 @@ func (suite *IntegrationSuite) TestGetLockUnlockCancel(c *check.C) {
                        c.Check(ctr.State, check.Equals, arvados.ContainerStateLocked)
                        err = cq.Lock(uuid)
                        c.Check(err, check.NotNil)
+
                        err = cq.Unlock(uuid)
                        c.Check(err, check.IsNil)
                        ctr, ok = cq.Get(uuid)
@@ -88,6 +90,14 @@ func (suite *IntegrationSuite) TestGetLockUnlockCancel(c *check.C) {
                        c.Check(ctr.State, check.Equals, arvados.ContainerStateQueued)
                        err = cq.Unlock(uuid)
                        c.Check(err, check.NotNil)
+
+                       err = cq.Cancel(uuid)
+                       c.Check(err, check.IsNil)
+                       ctr, ok = cq.Get(uuid)
+                       c.Check(ok, check.Equals, true)
+                       c.Check(ctr.State, check.Equals, arvados.ContainerStateCancelled)
+                       err = cq.Lock(uuid)
+                       c.Check(err, check.NotNil)
                }()
        }
        wg.Wait()
index d97e7428da0488e04009d4a0baeca01bbb18aa8b..1052fb0d76606ddf160c685830a88649b7c40acf 100644 (file)
@@ -40,6 +40,7 @@ setup(name='arvados-cwl-runner',
           'arvados-python-client>=1.3.0.20190205182514',
           'setuptools',
           'ciso8601 >= 2.0.0',
+          'networkx < 2.3'
       ],
       extras_require={
           ':os.name=="posix" and python_version<"3"': ['subprocess32 >= 3.5.1'],