From: Tom Clegg Date: Thu, 30 Jun 2022 20:03:52 +0000 (-0400) Subject: Merge branch '19088-s3-properties-tags' X-Git-Tag: 2.5.0~126 X-Git-Url: https://git.arvados.org/arvados.git/commitdiff_plain/607033c33f2001c194fe8c68d0dc17e4bde849da?hp=dd8f1b0527995bc5ad47710d3a483fa18b827bc6 Merge branch '19088-s3-properties-tags' closes #19088 Arvados-DCO-1.1-Signed-off-by: Tom Clegg --- diff --git a/doc/admin/config-urls.html.textile.liquid b/doc/admin/config-urls.html.textile.liquid index e518ea1bf7..01c30f0e0e 100644 --- a/doc/admin/config-urls.html.textile.liquid +++ b/doc/admin/config-urls.html.textile.liquid @@ -16,9 +16,9 @@ The @Services@ section lists a number of Arvados services, each with an @Interna The @ExternalURL@ is the address where the service should be reachable by clients, both from inside and from outside the Arvados cluster. Some services do not expose an Arvados API, only Prometheus metrics. In that case, @ExternalURL@ is not used. -The keys under @InternalURLs@ are addresses that are used by the reverse proxy (e.g. Nginx) that fronts Arvados services. The exception is the @Keepstore@ service, where clients connect directly to the addresses listed under @InternalURLs@. If a service is not fronted by a reverse proxy, e.g. when its endpoint only exposes Prometheus metrics, the intention is that metrics are collected directly from the endpoints defined in @InternalURLs@. +The keys under @InternalURLs@ are the URLs through which Arvados system components can connect to one another, including the reverse proxy (e.g. Nginx) that fronts Arvados services. The exception is the @Keepstore@ service, where clients on the local network connect directly to @Keepstore.InternalURLs@ (while clients from outside networks connect to @Keepproxy.ExternalURL@). If a service is not fronted by a reverse proxy, e.g. when its endpoint only exposes Prometheus metrics, the intention is that metrics are collected directly from the endpoints defined in @InternalURLs@. -@InternalURLs@ are also used by the service itself to figure out which address/port to listen on. +Each entry in the @InternalURLs@ section may also indicate a @ListenURL@ to determine the protocol, address/interface, and port where the service process will listen, in case the desired listening address differs from the @InternalURLs@ key itself -- for example, when passing internal traffic through a reverse proxy. If the Arvados service lives behind a reverse proxy (e.g. Nginx), configuring the reverse proxy and the @InternalURLs@ and @ExternalURL@ values must be done in concert. @@ -228,11 +228,12 @@ Consider this section for the @Controller@ service: {% codeblock as yaml %} Controller: InternalURLs: - "http://localhost:8003": {} + "https://ctrl-0.internal": + ListenURL: "http://localhost:8003" ExternalURL: "https://ClusterID.example.com" {% endcodeblock %} -The @ExternalURL@ advertised is @https://ClusterID.example.com@. The @Controller@ service will start up on @localhost@ port 8003. Nginx is configured to sit in front of the @Controller@ service and terminates SSL: +The @ExternalURL@ advertised to clients is @https://ClusterID.example.com@. The @arvados-controller@ process will listen on @localhost@ port 8003. Other Arvados service processes in the cluster can connect to this specific controller instance, using the URL @https://ctrl-0.internal@. Nginx is configured to sit in front of the @Controller@ service and terminate TLS:

 # This is the port where nginx expects to contact arvados-controller.
@@ -245,7 +246,7 @@ server {
   # the request is reverse proxied to the upstream 'controller'
 
   listen       443 ssl;
-  server_name  ClusterID.example.com;
+  server_name  ClusterID.example.com ctrl-0.internal;
 
   ssl_certificate     /YOUR/PATH/TO/cert.pem;
   ssl_certificate_key /YOUR/PATH/TO/cert.key;
@@ -275,4 +276,13 @@ server {
 }
 
+If the host part of @ListenURL@ is ambiguous, in the sense that more than one system host is able to listen on that address (e.g., @localhost@), configure each host's startup scripts to set the environment variable @ARVADOS_SERVICE_INTERNAL_URL@ to the @InternalURLs@ key that will reach that host. In the example above, this would be @ARVADOS_SERVICE_INTERNAL_URL=https://ctrl-0.internal@. + +If the cluster has just a single node running all of the Arvados server processes, configuration can be simplified: +{% codeblock as yaml %} + Controller: + InternalURLs: + "http://localhost:8003": {} + ExternalURL: "https://ClusterID.example.com" +{% endcodeblock %} diff --git a/doc/api/methods/containers.html.textile.liquid b/doc/api/methods/containers.html.textile.liquid index 76e5730c9f..43163c5550 100644 --- a/doc/api/methods/containers.html.textile.liquid +++ b/doc/api/methods/containers.html.textile.liquid @@ -52,8 +52,8 @@ Generally this will contain additional keys that are not present in any correspo |output|string|Portable data hash of the output collection.|Null if the container is not yet finished.| |container_image|string|Portable data hash of a collection containing the docker image used to run the container.|| |progress|number|A number between 0.0 and 1.0 describing the fraction of work done.|| -|priority|integer|Range 0-1000. Indicate scheduling order preference.|Currently assigned by the system as the max() of the priorities of all associated ContainerRequests. See "container request priority":container_requests.html#priority .| -|exit_code|integer|Process exit code.|Null if state!="Complete"| +|priority|integer|Range 0-1000. Indicate scheduling order preference.|Currently assigned by the system as the max() of the priorities of all associated ContainerRequests. See "container request priority":container_requests.html#priority.| +|exit_code|integer|Process exit code.|Null if container process has not exited yet.| |auth_uuid|string|UUID of a token to be passed into the container itself, used to access Keep-backed mounts, etc. Automatically assigned.|Null if state∉{"Locked","Running"} or if @runtime_token@ was provided.| |locked_by_uuid|string|UUID of a token, indicating which dispatch process changed state to Locked. If null, any token can be used to lock. If not null, only the indicated token can modify this container.|Null if state∉{"Locked","Running"}| |runtime_token|string|A v2 token to be passed into the container itself, used to access Keep-backed mounts, etc.|Not returned in API responses. Reset to null when state is "Complete" or "Cancelled".| diff --git a/lib/config/cmd_test.go b/lib/config/cmd_test.go index 7167982ccd..9503a54d2d 100644 --- a/lib/config/cmd_test.go +++ b/lib/config/cmd_test.go @@ -217,7 +217,7 @@ Clusters: code := DumpCommand.RunCommand("arvados config-dump", []string{"-config", "-"}, bytes.NewBufferString(in), &stdout, &stderr) c.Check(code, check.Equals, 0) c.Check(stdout.String(), check.Matches, `(?ms).*TimeoutBooting: 10m\n.*`) - c.Check(stdout.String(), check.Matches, `(?ms).*http://localhost:12345/: {}\n.*`) + c.Check(stdout.String(), check.Matches, `(?ms).*http://localhost:12345/:\n +ListenURL: ""\n.*`) } func (s *CommandSuite) TestDump_UnknownKey(c *check.C) { diff --git a/lib/config/config.default.yml b/lib/config/config.default.yml index a9bbf4eee9..472a22c6b2 100644 --- a/lib/config/config.default.yml +++ b/lib/config/config.default.yml @@ -22,47 +22,78 @@ Clusters: Services: - # In each of the service sections below, the keys under - # InternalURLs are the endpoints where the service should be - # listening, and reachable from other hosts in the - # cluster. Example: + # Each of the service sections below specifies InternalURLs + # (each with optional ListenURL) and ExternalURL. + # + # InternalURLs specify how other Arvados service processes will + # connect to the service. Typically these use internal hostnames + # and high port numbers. Example: + # + # InternalURLs: + # "http://host1.internal.example:12345": {} + # "http://host2.internal.example:12345": {} + # + # ListenURL specifies the address and port the service process's + # HTTP server should listen on, if different from the + # InternalURL itself. Example, using an intermediate TLS proxy: # # InternalURLs: - # "http://host1.example:12345": {} - # "http://host2.example:12345": {} + # "https://host1.internal.example": + # ListenURL: "http://10.0.0.7:12345" + # + # When there are multiple InternalURLs configured, the service + # process will try listening on each InternalURLs (using + # ListenURL if provided) until one works. If you use a ListenURL + # like "0.0.0.0" which can be bound on any machine, use an + # environment variable + # ARVADOS_SERVICE_INTERNAL_URL=http://host1.internal.example to + # control which entry to use. + # + # ExternalURL specifies how applications/clients will connect to + # the service, regardless of whether they are inside or outside + # the cluster. Example: + # + # ExternalURL: "https://keep.zzzzz.example.com/" + # + # To avoid routing internal traffic through external networks, + # use split-horizon DNS for ExternalURL host names: inside the + # cluster's private network "host.zzzzz.example.com" resolves to + # the host's private IP address, while outside the cluster + # "host.zzzzz.example.com" resolves to the host's public IP + # address (or its external gateway or load balancer). RailsAPI: - InternalURLs: {SAMPLE: {}} + InternalURLs: {SAMPLE: {ListenURL: ""}} ExternalURL: "" Controller: - InternalURLs: {SAMPLE: {}} + InternalURLs: {SAMPLE: {ListenURL: ""}} ExternalURL: "" Websocket: - InternalURLs: {SAMPLE: {}} + InternalURLs: {SAMPLE: {ListenURL: ""}} ExternalURL: "" Keepbalance: - InternalURLs: {SAMPLE: {}} + InternalURLs: {SAMPLE: {ListenURL: ""}} ExternalURL: "" GitHTTP: - InternalURLs: {SAMPLE: {}} + InternalURLs: {SAMPLE: {ListenURL: ""}} ExternalURL: "" GitSSH: - InternalURLs: {SAMPLE: {}} + InternalURLs: {SAMPLE: {ListenURL: ""}} ExternalURL: "" DispatchCloud: - InternalURLs: {SAMPLE: {}} + InternalURLs: {SAMPLE: {ListenURL: ""}} ExternalURL: "" DispatchLSF: - InternalURLs: {SAMPLE: {}} + InternalURLs: {SAMPLE: {ListenURL: ""}} ExternalURL: "" DispatchSLURM: - InternalURLs: {SAMPLE: {}} + InternalURLs: {SAMPLE: {ListenURL: ""}} ExternalURL: "" Keepproxy: - InternalURLs: {SAMPLE: {}} + InternalURLs: {SAMPLE: {ListenURL: ""}} ExternalURL: "" WebDAV: - InternalURLs: {SAMPLE: {}} + InternalURLs: {SAMPLE: {ListenURL: ""}} # Base URL for Workbench inline preview. If blank, use # WebDAVDownload instead, and disable inline preview. # If both are empty, downloading collections from workbench @@ -101,7 +132,7 @@ Clusters: ExternalURL: "" WebDAVDownload: - InternalURLs: {SAMPLE: {}} + InternalURLs: {SAMPLE: {ListenURL: ""}} # Base URL for download links. If blank, serve links to WebDAV # with disposition=attachment query param. Unlike preview links, # browsers do not render attachments, so there is no risk of XSS. @@ -117,6 +148,7 @@ Clusters: Keepstore: InternalURLs: SAMPLE: + ListenURL: "" # Rendezvous is normally empty/omitted. When changing the # URL of a Keepstore service, Rendezvous should be set to # the old URL (with trailing slash omitted) to preserve @@ -124,10 +156,10 @@ Clusters: Rendezvous: "" ExternalURL: "" Composer: - InternalURLs: {SAMPLE: {}} + InternalURLs: {SAMPLE: {ListenURL: ""}} ExternalURL: "" WebShell: - InternalURLs: {SAMPLE: {}} + InternalURLs: {SAMPLE: {ListenURL: ""}} # ShellInABox service endpoint URL for a given VM. If empty, do not # offer web shell logins. # @@ -138,13 +170,13 @@ Clusters: # https://*.webshell.uuid_prefix.arvadosapi.com ExternalURL: "" Workbench1: - InternalURLs: {SAMPLE: {}} + InternalURLs: {SAMPLE: {ListenURL: ""}} ExternalURL: "" Workbench2: - InternalURLs: {SAMPLE: {}} + InternalURLs: {SAMPLE: {ListenURL: ""}} ExternalURL: "" Health: - InternalURLs: {SAMPLE: {}} + InternalURLs: {SAMPLE: {ListenURL: ""}} ExternalURL: "" PostgreSQL: diff --git a/lib/controller/handler.go b/lib/controller/handler.go index f5840b34ce..665fd5c636 100644 --- a/lib/controller/handler.go +++ b/lib/controller/handler.go @@ -13,7 +13,6 @@ import ( "net/url" "strings" "sync" - "time" "git.arvados.org/arvados.git/lib/controller/api" "git.arvados.org/arvados.git/lib/controller/federation" @@ -61,12 +60,6 @@ func (h *Handler) ServeHTTP(w http.ResponseWriter, req *http.Request) { req.URL.Path = strings.Replace(req.URL.Path, "//", "/", -1) } } - if h.Cluster.API.RequestTimeout > 0 { - ctx, cancel := context.WithDeadline(req.Context(), time.Now().Add(time.Duration(h.Cluster.API.RequestTimeout))) - req = req.WithContext(ctx) - defer cancel() - } - h.handlerStack.ServeHTTP(w, req) } diff --git a/lib/controller/handler_test.go b/lib/controller/handler_test.go index 5e467cb058..39c2b1c68e 100644 --- a/lib/controller/handler_test.go +++ b/lib/controller/handler_test.go @@ -204,17 +204,21 @@ func (s *HandlerSuite) TestProxyDiscoveryDoc(c *check.C) { c.Check(len(dd.Schemas), check.Not(check.Equals), 0) } -func (s *HandlerSuite) TestRequestTimeout(c *check.C) { - s.cluster.API.RequestTimeout = arvados.Duration(time.Nanosecond) - req := httptest.NewRequest("GET", "/discovery/v1/apis/arvados/v1/rest", nil) +// Handler should give up and exit early if request context is +// cancelled due to client hangup, httpserver.HandlerWithDeadline, +// etc. +func (s *HandlerSuite) TestRequestCancel(c *check.C) { + ctx, cancel := context.WithCancel(context.Background()) + req := httptest.NewRequest("GET", "/discovery/v1/apis/arvados/v1/rest", nil).WithContext(ctx) resp := httptest.NewRecorder() + cancel() s.handler.ServeHTTP(resp, req) c.Check(resp.Code, check.Equals, http.StatusBadGateway) var jresp httpserver.ErrorResponse err := json.Unmarshal(resp.Body.Bytes(), &jresp) c.Check(err, check.IsNil) c.Assert(len(jresp.Errors), check.Equals, 1) - c.Check(jresp.Errors[0], check.Matches, `.*context deadline exceeded.*`) + c.Check(jresp.Errors[0], check.Matches, `.*context canceled`) } func (s *HandlerSuite) TestProxyWithoutToken(c *check.C) { diff --git a/lib/crunchrun/crunchrun.go b/lib/crunchrun/crunchrun.go index 30871e7349..0e86f604a7 100644 --- a/lib/crunchrun/crunchrun.go +++ b/lib/crunchrun/crunchrun.go @@ -1095,6 +1095,12 @@ func (runner *ContainerRunner) WaitFinish() error { } } runner.CrunchLog.Printf("Container exited with status code %d%s", exitcode, extra) + err = runner.DispatcherArvClient.Update("containers", runner.Container.UUID, arvadosclient.Dict{ + "container": arvadosclient.Dict{"exit_code": exitcode}, + }, nil) + if err != nil { + runner.CrunchLog.Printf("ignoring error updating exit_code: %s", err) + } var returnErr error if err = runner.executorStdin.Close(); err != nil { @@ -1162,10 +1168,9 @@ func (runner *ContainerRunner) updateLogs() { continue } - var updated arvados.Container err = runner.DispatcherArvClient.Update("containers", runner.Container.UUID, arvadosclient.Dict{ "container": arvadosclient.Dict{"log": saved.PortableDataHash}, - }, &updated) + }, nil) if err != nil { runner.CrunchLog.Printf("error updating container log to %s: %s", saved.PortableDataHash, err) continue @@ -1443,13 +1448,13 @@ func (runner *ContainerRunner) UpdateContainerFinal() error { if runner.LogsPDH != nil { update["log"] = *runner.LogsPDH } - if runner.finalState == "Complete" { - if runner.ExitCode != nil { - update["exit_code"] = *runner.ExitCode - } - if runner.OutputPDH != nil { - update["output"] = *runner.OutputPDH - } + if runner.ExitCode != nil { + update["exit_code"] = *runner.ExitCode + } else { + update["exit_code"] = nil + } + if runner.finalState == "Complete" && runner.OutputPDH != nil { + update["output"] = *runner.OutputPDH } return runner.DispatcherArvClient.Update("containers", runner.Container.UUID, arvadosclient.Dict{"container": update}, nil) } diff --git a/lib/crunchrun/crunchrun_test.go b/lib/crunchrun/crunchrun_test.go index 9971757893..76289b951d 100644 --- a/lib/crunchrun/crunchrun_test.go +++ b/lib/crunchrun/crunchrun_test.go @@ -276,7 +276,7 @@ func (client *ArvTestClient) Update(resourceType string, uuid string, parameters if parameters["container"].(arvadosclient.Dict)["state"] == "Running" { client.WasSetRunning = true } - } else if resourceType == "collections" { + } else if resourceType == "collections" && output != nil { mt := parameters["collection"].(arvadosclient.Dict)["manifest_text"].(string) output.(*arvados.Collection).UUID = uuid output.(*arvados.Collection).PortableDataHash = fmt.Sprintf("%x", md5.Sum([]byte(mt))) diff --git a/lib/service/cmd.go b/lib/service/cmd.go index 679cbede13..4b640c4e47 100644 --- a/lib/service/cmd.go +++ b/lib/service/cmd.go @@ -121,11 +121,11 @@ func (c *command) RunCommand(prog string, args []string, stdin io.Reader, stdout }) ctx := ctxlog.Context(c.ctx, logger) - listenURL, err := getListenAddr(cluster.Services, c.svcName, log) + listenURL, internalURL, err := getListenAddr(cluster.Services, c.svcName, log) if err != nil { return 1 } - ctx = context.WithValue(ctx, contextKeyURL{}, listenURL) + ctx = context.WithValue(ctx, contextKeyURL{}, internalURL) reg := prometheus.NewRegistry() loader.RegisterMetrics(reg) @@ -147,9 +147,10 @@ func (c *command) RunCommand(prog string, args []string, stdin io.Reader, stdout instrumented := httpserver.Instrument(reg, log, httpserver.HandlerWithDeadline(cluster.API.RequestTimeout.Duration(), httpserver.AddRequestIDs( - httpserver.LogRequests( - interceptHealthReqs(cluster.ManagementToken, handler.CheckHealth, - httpserver.NewRequestLimiter(cluster.API.MaxConcurrentRequests, handler, reg)))))) + httpserver.Inspect(reg, cluster.ManagementToken, + httpserver.LogRequests( + interceptHealthReqs(cluster.ManagementToken, handler.CheckHealth, + httpserver.NewRequestLimiter(cluster.API.MaxConcurrentRequests, handler, reg))))))) srv := &httpserver.Server{ Server: http.Server{ Handler: ifCollectionInHost(instrumented, instrumented.ServeAPI(cluster.ManagementToken, instrumented)), @@ -157,7 +158,7 @@ func (c *command) RunCommand(prog string, args []string, stdin io.Reader, stdout }, Addr: listenURL.Host, } - if listenURL.Scheme == "https" { + if listenURL.Scheme == "https" || listenURL.Scheme == "wss" { tlsconfig, err := tlsConfigWithCertUpdater(cluster, logger) if err != nil { logger.WithError(err).Errorf("cannot start %s service on %s", c.svcName, listenURL.String()) @@ -223,28 +224,72 @@ func interceptHealthReqs(mgtToken string, checkHealth func() error, next http.Ha return ifCollectionInHost(next, mux) } -func getListenAddr(svcs arvados.Services, prog arvados.ServiceName, log logrus.FieldLogger) (arvados.URL, error) { +// Determine listenURL (addr:port where server should bind) and +// internalURL (target url that client should connect to) for a +// service. +// +// If the config does not specify ListenURL, we check all of the +// configured InternalURLs. If there is exactly one that matches our +// hostname, or exactly one that matches a local interface address, +// then we use that as listenURL. +// +// Note that listenURL and internalURL may use different protocols +// (e.g., listenURL is http, but the service sits behind a proxy, so +// clients connect using https). +func getListenAddr(svcs arvados.Services, prog arvados.ServiceName, log logrus.FieldLogger) (arvados.URL, arvados.URL, error) { svc, ok := svcs.Map()[prog] if !ok { - return arvados.URL{}, fmt.Errorf("unknown service name %q", prog) + return arvados.URL{}, arvados.URL{}, fmt.Errorf("unknown service name %q", prog) } - if want := os.Getenv("ARVADOS_SERVICE_INTERNAL_URL"); want == "" { - } else if url, err := url.Parse(want); err != nil { - return arvados.URL{}, fmt.Errorf("$ARVADOS_SERVICE_INTERNAL_URL (%q): %s", want, err) - } else { + if want := os.Getenv("ARVADOS_SERVICE_INTERNAL_URL"); want != "" { + url, err := url.Parse(want) + if err != nil { + return arvados.URL{}, arvados.URL{}, fmt.Errorf("$ARVADOS_SERVICE_INTERNAL_URL (%q): %s", want, err) + } if url.Path == "" { url.Path = "/" } - return arvados.URL(*url), nil + for internalURL, conf := range svc.InternalURLs { + if internalURL.String() == url.String() { + listenURL := conf.ListenURL + if listenURL.Host == "" { + listenURL = internalURL + } + return listenURL, internalURL, nil + } + } + log.Warnf("possible configuration error: listening on %s (from $ARVADOS_SERVICE_INTERNAL_URL) even though configuration does not have a matching InternalURLs entry", url) + internalURL := arvados.URL(*url) + return internalURL, internalURL, nil } errors := []string{} - for url := range svc.InternalURLs { - listener, err := net.Listen("tcp", url.Host) + for internalURL, conf := range svc.InternalURLs { + listenURL := conf.ListenURL + if listenURL.Host == "" { + // If ListenURL is not specified, assume + // InternalURL is also usable as the listening + // proto/addr/port (i.e., simple case with no + // intermediate proxy/routing) + listenURL = internalURL + } + listenAddr := listenURL.Host + if _, _, err := net.SplitHostPort(listenAddr); err != nil { + // url "https://foo.example/" (with no + // explicit port name/number) means listen on + // the well-known port for the specified + // protocol, "foo.example:https". + port := listenURL.Scheme + if port == "ws" || port == "wss" { + port = "http" + port[2:] + } + listenAddr = net.JoinHostPort(listenAddr, port) + } + listener, err := net.Listen("tcp", listenAddr) if err == nil { listener.Close() - return url, nil + return listenURL, internalURL, nil } else if strings.Contains(err.Error(), "cannot assign requested address") { // If 'Host' specifies a different server than // the current one, it'll resolve the hostname @@ -252,13 +297,13 @@ func getListenAddr(svcs arvados.Services, prog arvados.ServiceName, log logrus.F // can't bind an IP address it doesn't own. continue } else { - errors = append(errors, fmt.Sprintf("tried %v, got %v", url, err)) + errors = append(errors, fmt.Sprintf("%s: %s", listenURL, err)) } } if len(errors) > 0 { - return arvados.URL{}, fmt.Errorf("could not enable the %q service on this host: %s", prog, strings.Join(errors, "; ")) + return arvados.URL{}, arvados.URL{}, fmt.Errorf("could not enable the %q service on this host: %s", prog, strings.Join(errors, "; ")) } - return arvados.URL{}, fmt.Errorf("configuration does not enable the %q service on this host", prog) + return arvados.URL{}, arvados.URL{}, fmt.Errorf("configuration does not enable the %q service on this host", prog) } type contextKeyURL struct{} diff --git a/lib/service/cmd_test.go b/lib/service/cmd_test.go index 10591d9b55..7db9109274 100644 --- a/lib/service/cmd_test.go +++ b/lib/service/cmd_test.go @@ -11,7 +11,9 @@ import ( "crypto/tls" "fmt" "io/ioutil" + "net" "net/http" + "net/url" "os" "testing" "time" @@ -35,6 +37,126 @@ const ( contextKey key = iota ) +func (*Suite) TestGetListenAddress(c *check.C) { + // Find an available port on the testing host, so the test + // cases don't get confused by "already in use" errors. + listener, err := net.Listen("tcp", ":") + c.Assert(err, check.IsNil) + _, unusedPort, err := net.SplitHostPort(listener.Addr().String()) + c.Assert(err, check.IsNil) + listener.Close() + + defer os.Unsetenv("ARVADOS_SERVICE_INTERNAL_URL") + for idx, trial := range []struct { + // internalURL => listenURL, both with trailing "/" + // because config loader always adds it + internalURLs map[string]string + envVar string + expectErrorMatch string + expectLogsMatch string + expectListen string + expectInternal string + }{ + { + internalURLs: map[string]string{"http://localhost:" + unusedPort + "/": ""}, + expectListen: "http://localhost:" + unusedPort + "/", + expectInternal: "http://localhost:" + unusedPort + "/", + }, + { // implicit port 80 in InternalURLs + internalURLs: map[string]string{"http://localhost/": ""}, + expectErrorMatch: `.*:80: bind: permission denied`, + }, + { // implicit port 443 in InternalURLs + internalURLs: map[string]string{"https://host.example/": "http://localhost:" + unusedPort + "/"}, + expectListen: "http://localhost:" + unusedPort + "/", + expectInternal: "https://host.example/", + }, + { // implicit port 443 in ListenURL + internalURLs: map[string]string{"wss://host.example/": "wss://localhost/"}, + expectErrorMatch: `.*:443: bind: permission denied`, + }, + { + internalURLs: map[string]string{"https://hostname.example/": "http://localhost:8000/"}, + expectListen: "http://localhost:8000/", + expectInternal: "https://hostname.example/", + }, + { + internalURLs: map[string]string{ + "https://hostname1.example/": "http://localhost:12435/", + "https://hostname2.example/": "http://localhost:" + unusedPort + "/", + }, + envVar: "https://hostname2.example", // note this works despite missing trailing "/" + expectListen: "http://localhost:" + unusedPort + "/", + expectInternal: "https://hostname2.example/", + }, + { // cannot listen on any of the ListenURLs + internalURLs: map[string]string{ + "https://hostname1.example/": "http://1.2.3.4:" + unusedPort + "/", + "https://hostname2.example/": "http://1.2.3.4:" + unusedPort + "/", + }, + expectErrorMatch: "configuration does not enable the \"arvados-controller\" service on this host", + }, + { // cannot listen on any of the (implied) ListenURLs + internalURLs: map[string]string{ + "https://1.2.3.4/": "", + "https://1.2.3.5/": "", + }, + expectErrorMatch: "configuration does not enable the \"arvados-controller\" service on this host", + }, + { // impossible port number + internalURLs: map[string]string{ + "https://host.example/": "http://0.0.0.0:1234567", + }, + expectErrorMatch: `.*:1234567: listen tcp: address 1234567: invalid port`, + }, + { + // env var URL not mentioned in config = obey env var, with warning + internalURLs: map[string]string{"https://hostname1.example/": "http://localhost:8000/"}, + envVar: "https://hostname2.example", + expectListen: "https://hostname2.example/", + expectInternal: "https://hostname2.example/", + expectLogsMatch: `.*\Qpossible configuration error: listening on https://hostname2.example/ (from $ARVADOS_SERVICE_INTERNAL_URL) even though configuration does not have a matching InternalURLs entry\E.*\n`, + }, + { + // env var + empty config = obey env var, with warning + envVar: "https://hostname.example", + expectListen: "https://hostname.example/", + expectInternal: "https://hostname.example/", + expectLogsMatch: `.*\Qpossible configuration error: listening on https://hostname.example/ (from $ARVADOS_SERVICE_INTERNAL_URL) even though configuration does not have a matching InternalURLs entry\E.*\n`, + }, + } { + c.Logf("trial %d %+v", idx, trial) + os.Setenv("ARVADOS_SERVICE_INTERNAL_URL", trial.envVar) + var logbuf bytes.Buffer + log := ctxlog.New(&logbuf, "text", "info") + services := arvados.Services{Controller: arvados.Service{InternalURLs: map[arvados.URL]arvados.ServiceInstance{}}} + for k, v := range trial.internalURLs { + u, err := url.Parse(k) + c.Assert(err, check.IsNil) + si := arvados.ServiceInstance{} + if v != "" { + u, err := url.Parse(v) + c.Assert(err, check.IsNil) + si.ListenURL = arvados.URL(*u) + } + services.Controller.InternalURLs[arvados.URL(*u)] = si + } + listenURL, internalURL, err := getListenAddr(services, "arvados-controller", log) + if trial.expectLogsMatch != "" { + c.Check(logbuf.String(), check.Matches, trial.expectLogsMatch) + } + if trial.expectErrorMatch != "" { + c.Check(err, check.ErrorMatches, trial.expectErrorMatch) + continue + } + if !c.Check(err, check.IsNil) { + continue + } + c.Check(listenURL.String(), check.Equals, trial.expectListen) + c.Check(internalURL.String(), check.Equals, trial.expectInternal) + } +} + func (*Suite) TestCommand(c *check.C) { cf, err := ioutil.TempFile("", "cmd_test.") c.Assert(err, check.IsNil) diff --git a/sdk/go/arvados/config.go b/sdk/go/arvados/config.go index 0d8f293124..c90551a610 100644 --- a/sdk/go/arvados/config.go +++ b/sdk/go/arvados/config.go @@ -401,6 +401,7 @@ func (su URL) String() string { } type ServiceInstance struct { + ListenURL URL Rendezvous string `json:",omitempty"` } diff --git a/sdk/go/httpserver/inspect.go b/sdk/go/httpserver/inspect.go new file mode 100644 index 0000000000..cb08acf962 --- /dev/null +++ b/sdk/go/httpserver/inspect.go @@ -0,0 +1,133 @@ +// Copyright (C) The Arvados Authors. All rights reserved. +// +// SPDX-License-Identifier: Apache-2.0 + +package httpserver + +import ( + "encoding/json" + "net/http" + "sort" + "sync" + "sync/atomic" + "time" + + "github.com/prometheus/client_golang/prometheus" +) + +// Inspect serves a report of current requests at "GET +// /_inspect/requests", and passes other requests through to the next +// handler. +// +// If registry is not nil, Inspect registers metrics about current +// requests. +func Inspect(registry *prometheus.Registry, authToken string, next http.Handler) http.Handler { + type ent struct { + startTime time.Time + hangupTime atomic.Value + } + current := map[*http.Request]*ent{} + mtx := sync.Mutex{} + if registry != nil { + registry.MustRegister(prometheus.NewGaugeFunc( + prometheus.GaugeOpts{ + Namespace: "arvados", + Name: "max_active_request_age_seconds", + Help: "Age of oldest active request", + }, + func() float64 { + mtx.Lock() + defer mtx.Unlock() + earliest := time.Time{} + any := false + for _, e := range current { + if _, ok := e.hangupTime.Load().(time.Time); ok { + // Don't count abandoned requests here + continue + } + if !any || e.startTime.Before(earliest) { + any = true + earliest = e.startTime + } + } + if !any { + return 0 + } + return float64(time.Since(earliest).Seconds()) + }, + )) + registry.MustRegister(prometheus.NewGaugeFunc( + prometheus.GaugeOpts{ + Namespace: "arvados", + Name: "max_abandoned_request_age_seconds", + Help: "Maximum time since client hung up on a request whose processing thread is still running", + }, + func() float64 { + mtx.Lock() + defer mtx.Unlock() + earliest := time.Time{} + any := false + for _, e := range current { + if hangupTime, ok := e.hangupTime.Load().(time.Time); ok { + if !any || hangupTime.Before(earliest) { + any = true + earliest = hangupTime + } + } + } + if !any { + return 0 + } + return float64(time.Since(earliest).Seconds()) + }, + )) + } + return http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) { + if req.Method == "GET" && req.URL.Path == "/_inspect/requests" { + if authToken == "" || req.Header.Get("Authorization") != "Bearer "+authToken { + Error(w, "unauthorized", http.StatusUnauthorized) + return + } + mtx.Lock() + defer mtx.Unlock() + type outrec struct { + RequestID string + Method string + Host string + URL string + RemoteAddr string + Elapsed float64 + } + now := time.Now() + outrecs := []outrec{} + for req, e := range current { + outrecs = append(outrecs, outrec{ + RequestID: req.Header.Get(HeaderRequestID), + Method: req.Method, + Host: req.Host, + URL: req.URL.String(), + RemoteAddr: req.RemoteAddr, + Elapsed: now.Sub(e.startTime).Seconds(), + }) + } + sort.Slice(outrecs, func(i, j int) bool { return outrecs[i].Elapsed < outrecs[j].Elapsed }) + w.Header().Set("Content-Type", "application/json") + json.NewEncoder(w).Encode(outrecs) + } else { + e := ent{startTime: time.Now()} + mtx.Lock() + current[req] = &e + mtx.Unlock() + go func() { + <-req.Context().Done() + e.hangupTime.Store(time.Now()) + }() + defer func() { + mtx.Lock() + defer mtx.Unlock() + delete(current, req) + }() + next.ServeHTTP(w, req) + } + }) +} diff --git a/sdk/go/httpserver/inspect_test.go b/sdk/go/httpserver/inspect_test.go new file mode 100644 index 0000000000..624cedb3b7 --- /dev/null +++ b/sdk/go/httpserver/inspect_test.go @@ -0,0 +1,98 @@ +// Copyright (C) The Arvados Authors. All rights reserved. +// +// SPDX-License-Identifier: Apache-2.0 + +package httpserver + +import ( + "context" + "encoding/json" + "net/http" + "net/http/httptest" + "strings" + "time" + + "github.com/prometheus/client_golang/prometheus" + "github.com/prometheus/client_golang/prometheus/promhttp" + check "gopkg.in/check.v1" +) + +func (s *Suite) TestInspect(c *check.C) { + reg := prometheus.NewRegistry() + h := newTestHandler() + mh := Inspect(reg, "abcd", h) + handlerReturned := make(chan struct{}) + reqctx, reqcancel := context.WithCancel(context.Background()) + longreq := httptest.NewRequest("GET", "/test", nil).WithContext(reqctx) + go func() { + mh.ServeHTTP(httptest.NewRecorder(), longreq) + close(handlerReturned) + }() + <-h.inHandler + + resp := httptest.NewRecorder() + req := httptest.NewRequest("GET", "/_inspect/requests", nil) + mh.ServeHTTP(resp, req) + c.Check(resp.Code, check.Equals, http.StatusUnauthorized) + c.Check(resp.Body.String(), check.Equals, `{"errors":["unauthorized"]}`+"\n") + + resp = httptest.NewRecorder() + req.Header.Set("Authorization", "Bearer abcde") + mh.ServeHTTP(resp, req) + c.Check(resp.Code, check.Equals, http.StatusUnauthorized) + + resp = httptest.NewRecorder() + req.Header.Set("Authorization", "Bearer abcd") + mh.ServeHTTP(resp, req) + c.Check(resp.Code, check.Equals, http.StatusOK) + reqs := []map[string]interface{}{} + err := json.NewDecoder(resp.Body).Decode(&reqs) + c.Check(err, check.IsNil) + c.Check(reqs, check.HasLen, 1) + c.Check(reqs[0]["URL"], check.Equals, "/test") + + // Request is active, so we should see active request age > 0 + resp = httptest.NewRecorder() + mreq := httptest.NewRequest("GET", "/metrics", nil) + promhttp.HandlerFor(reg, promhttp.HandlerOpts{}).ServeHTTP(resp, mreq) + c.Check(resp.Code, check.Equals, http.StatusOK) + c.Check(resp.Body.String(), check.Matches, `(?ms).*\narvados_max_active_request_age_seconds [0\.]*[1-9][-\d\.e]*\n.*`) + c.Check(resp.Body.String(), check.Matches, `(?ms).*\narvados_max_abandoned_request_age_seconds 0\n.*`) + + reqcancel() + + // Request context is canceled but handler hasn't returned, so + // we should see max abandoned request age > 0 and active == + // 0. We might need to wait a short time for the cancel to + // propagate. + for deadline := time.Now().Add(time.Second); time.Now().Before(deadline); time.Sleep(time.Second / 100) { + resp = httptest.NewRecorder() + promhttp.HandlerFor(reg, promhttp.HandlerOpts{}).ServeHTTP(resp, mreq) + c.Assert(resp.Code, check.Equals, http.StatusOK) + if strings.Contains(resp.Body.String(), "\narvados_max_active_request_age_seconds 0\n") { + break + } + } + c.Check(resp.Body.String(), check.Matches, `(?ms).*\narvados_max_active_request_age_seconds 0\n.*`) + c.Check(resp.Body.String(), check.Matches, `(?ms).*\narvados_max_abandoned_request_age_seconds [0\.]*[1-9][-\d\.e]*\n.*`) + + h.okToProceed <- struct{}{} + <-handlerReturned + + // Handler has returned, so we should see max abandoned + // request age == max active request age == 0 + resp = httptest.NewRecorder() + promhttp.HandlerFor(reg, promhttp.HandlerOpts{}).ServeHTTP(resp, mreq) + c.Check(resp.Code, check.Equals, http.StatusOK) + c.Check(resp.Body.String(), check.Matches, `(?ms).*\narvados_max_active_request_age_seconds 0\n.*`) + c.Check(resp.Body.String(), check.Matches, `(?ms).*\narvados_max_abandoned_request_age_seconds 0\n.*`) + + // ...and no active requests at the /_monitor endpoint + resp = httptest.NewRecorder() + mh.ServeHTTP(resp, req) + c.Check(resp.Code, check.Equals, http.StatusOK) + reqs = nil + err = json.NewDecoder(resp.Body).Decode(&reqs) + c.Check(err, check.IsNil) + c.Assert(reqs, check.HasLen, 0) +} diff --git a/sdk/go/httpserver/logger.go b/sdk/go/httpserver/logger.go index 5a46635e91..b71adf7118 100644 --- a/sdk/go/httpserver/logger.go +++ b/sdk/go/httpserver/logger.go @@ -47,7 +47,13 @@ func (hn hijackNotifier) Hijack() (net.Conn, *bufio.ReadWriter, error) { // HandlerWithDeadline cancels the request context if the request // takes longer than the specified timeout without having its // connection hijacked. +// +// If timeout is 0, there is no deadline: HandlerWithDeadline is a +// no-op. func HandlerWithDeadline(timeout time.Duration, next http.Handler) http.Handler { + if timeout == 0 { + return next + } return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { ctx, cancel := context.WithCancel(r.Context()) defer cancel() diff --git a/sdk/go/httpserver/request_limiter_test.go b/sdk/go/httpserver/request_limiter_test.go index 64d1f3d4cf..9258fbfa58 100644 --- a/sdk/go/httpserver/request_limiter_test.go +++ b/sdk/go/httpserver/request_limiter_test.go @@ -22,7 +22,7 @@ func (h *testHandler) ServeHTTP(resp http.ResponseWriter, req *http.Request) { <-h.okToProceed } -func newTestHandler(maxReqs int) *testHandler { +func newTestHandler() *testHandler { return &testHandler{ inHandler: make(chan struct{}), okToProceed: make(chan struct{}), @@ -30,7 +30,7 @@ func newTestHandler(maxReqs int) *testHandler { } func TestRequestLimiter1(t *testing.T) { - h := newTestHandler(10) + h := newTestHandler() l := NewRequestLimiter(1, h, nil) var wg sync.WaitGroup resps := make([]*httptest.ResponseRecorder, 10) @@ -90,7 +90,7 @@ func TestRequestLimiter1(t *testing.T) { } func TestRequestLimiter10(t *testing.T) { - h := newTestHandler(10) + h := newTestHandler() l := NewRequestLimiter(10, h, nil) var wg sync.WaitGroup for i := 0; i < 10; i++ { diff --git a/services/api/app/controllers/database_controller.rb b/services/api/app/controllers/database_controller.rb index 5c4cf7bc16..fa1e1ca43c 100644 --- a/services/api/app/controllers/database_controller.rb +++ b/services/api/app/controllers/database_controller.rb @@ -6,6 +6,8 @@ class DatabaseController < ApplicationController skip_before_action :find_object_by_uuid skip_before_action :render_404_if_no_object before_action :admin_required + around_action :silence_logs, only: [:reset] + def reset raise ArvadosModel::PermissionDeniedError unless Rails.env == 'test' @@ -83,4 +85,17 @@ class DatabaseController < ApplicationController # Done. send_json success: true end + + protected + + def silence_logs + Rails.logger.info("(logging level temporarily raised to :error, see #{__FILE__})") + orig = ActiveRecord::Base.logger.level + ActiveRecord::Base.logger.level = :error + begin + yield + ensure + ActiveRecord::Base.logger.level = orig + end + end end diff --git a/services/api/app/models/container.rb b/services/api/app/models/container.rb index 3a04c56046..08f87bbdb1 100644 --- a/services/api/app/models/container.rb +++ b/services/api/app/models/container.rb @@ -478,8 +478,8 @@ class Container < ArvadosModel def validate_change permitted = [:state] - progress_attrs = [:progress, :runtime_status, :log, :output, :output_properties] - final_attrs = [:exit_code, :finished_at] + progress_attrs = [:progress, :runtime_status, :log, :output, :output_properties, :exit_code] + final_attrs = [:finished_at] if self.new_record? permitted.push(:owner_uuid, :command, :container_image, :cwd, diff --git a/services/api/test/unit/container_test.rb b/services/api/test/unit/container_test.rb index ac3e6bea42..bcf99da2e3 100644 --- a/services/api/test/unit/container_test.rb +++ b/services/api/test/unit/container_test.rb @@ -870,16 +870,14 @@ class ContainerTest < ActiveSupport::TestCase end end - test "Container only set exit code on complete" do + test "can only change exit code while running and at completion" do set_user_from_auth :active c, _ = minimal_new set_user_from_auth :dispatch1 c.lock + check_illegal_updates c, [{exit_code: 1}] c.update_attributes! state: Container::Running - - check_illegal_updates c, [{exit_code: 1}, - {exit_code: 1, state: Container::Cancelled}] - + assert c.update_attributes(exit_code: 1) assert c.update_attributes(exit_code: 1, state: Container::Complete) end @@ -933,7 +931,7 @@ class ContainerTest < ActiveSupport::TestCase end ["auth_uuid", "runtime_token"].each do |tok| - test "#{tok} can set output, progress, runtime_status, state on running container -- but not log" do + test "#{tok} can set output, progress, runtime_status, state, exit_code on running container -- but not log" do if tok == "runtime_token" set_user_from_auth :spectator c, _ = minimal_new(container_image: "9ae44d5792468c58bcf85ce7353c7027+124", @@ -963,6 +961,7 @@ class ContainerTest < ActiveSupport::TestCase assert c.update_attributes(output: collections(:collection_owned_by_active).portable_data_hash) assert c.update_attributes(runtime_status: {'warning' => 'something happened'}) assert c.update_attributes(progress: 0.5) + assert c.update_attributes(exit_code: 0) refute c.update_attributes(log: collections(:real_log_collection).portable_data_hash) c.reload assert c.update_attributes(state: Container::Complete, exit_code: 0) diff --git a/services/login-sync/bin/arvados-login-sync b/services/login-sync/bin/arvados-login-sync index da8a21efa3..5c6691ab95 100755 --- a/services/login-sync/bin/arvados-login-sync +++ b/services/login-sync/bin/arvados-login-sync @@ -10,6 +10,7 @@ require 'etc' require 'fileutils' require 'yaml' require 'optparse' +require 'open3' req_envs = %w(ARVADOS_API_HOST ARVADOS_API_TOKEN ARVADOS_VIRTUAL_MACHINE_UUID) req_envs.each do |k| @@ -124,11 +125,12 @@ begin unless pwnam[l[:username]] STDERR.puts "Creating account #{l[:username]}" # Create new user - unless system("useradd", "-m", + out, st = Open3.capture2e("useradd", "-m", "-c", username, "-s", "/bin/bash", username) - STDERR.puts "Account creation failed for #{l[:username]}: #{$?}" + if st.exitstatus != 0 + STDERR.puts "Account creation failed for #{l[:username]}:\n#{out}" next end begin @@ -150,7 +152,10 @@ begin if existing_groups.index(addgroup).nil? # User should be in group, but isn't, so add them. STDERR.puts "Add user #{username} to #{addgroup} group" - system("usermod", "-aG", addgroup, username) + out, st = Open3.capture2e("usermod", "-aG", addgroup, username) + if st.exitstatus != 0 + STDERR.puts "Failed to add #{username} to #{addgroup} group:\n#{out}" + end end end @@ -158,7 +163,10 @@ begin if groups.index(removegroup).nil? # User is in a group, but shouldn't be, so remove them. STDERR.puts "Remove user #{username} from #{removegroup} group" - system("gpasswd", "-d", username, removegroup) + out, st = Open3.capture2e("gpasswd", "-d", username, removegroup) + if st.exitstatus != 0 + STDERR.puts "Failed to remove user #{username} from #{removegroup} group:\n#{out}" + end end end diff --git a/tools/salt-install/config_examples/multi_host/aws/pillars/arvados.sls b/tools/salt-install/config_examples/multi_host/aws/pillars/arvados.sls index f41b6ac5b3..02653082f3 100644 --- a/tools/salt-install/config_examples/multi_host/aws/pillars/arvados.sls +++ b/tools/salt-install/config_examples/multi_host/aws/pillars/arvados.sls @@ -93,7 +93,7 @@ arvados: resources: virtual_machines: shell: - name: shell + name: shell.__CLUSTER__.__DOMAIN__ backend: __SHELL_INT_IP__ port: 4200 diff --git a/tools/salt-install/config_examples/multi_host/aws/pillars/postgresql.sls b/tools/salt-install/config_examples/multi_host/aws/pillars/postgresql.sls index e06ddd041c..d6320da246 100644 --- a/tools/salt-install/config_examples/multi_host/aws/pillars/postgresql.sls +++ b/tools/salt-install/config_examples/multi_host/aws/pillars/postgresql.sls @@ -19,7 +19,7 @@ postgres: users: __CLUSTER___arvados: ensure: present - password: __DATABASE_PASSWORD__ + password: "__DATABASE_PASSWORD__" # tablespaces: # arvados_tablespace: diff --git a/tools/salt-install/config_examples/multi_host/aws/states/shell_cron_add_login_sync.sls b/tools/salt-install/config_examples/multi_host/aws/states/shell_cron_add_login_sync.sls index 86c591e97e..9028b9b100 100644 --- a/tools/salt-install/config_examples/multi_host/aws/states/shell_cron_add_login_sync.sls +++ b/tools/salt-install/config_examples/multi_host/aws/states/shell_cron_add_login_sync.sls @@ -75,6 +75,13 @@ extra_shell_cron_add_login_sync_add_{{ vm }}_arvados_virtual_machine_uuid_cron_e - onlyif: - /bin/grep -qE "[a-z0-9]{5}-2x53u-[a-z0-9]{15}" /tmp/vm_uuid_{{ vm }} +extra_shell_cron_add_login_sync_add_{{ vm }}_sbin_to_path_cron_env_present: + cron.env_present: + - name: PATH + - value: "/bin:/usr/bin:/usr/sbin" + - onlyif: + - /bin/grep -qE "[a-z0-9]{5}-2x53u-[a-z0-9]{15}" /tmp/vm_uuid_{{ vm }} + extra_shell_cron_add_login_sync_add_{{ vm }}_arvados_login_sync_cron_present: cron.present: - name: /usr/local/bin/arvados-login-sync diff --git a/tools/salt-install/config_examples/single_host/multiple_hostnames/pillars/nginx_passenger.sls b/tools/salt-install/config_examples/single_host/multiple_hostnames/pillars/nginx_passenger.sls index dfddf3b623..cf08779715 100644 --- a/tools/salt-install/config_examples/single_host/multiple_hostnames/pillars/nginx_passenger.sls +++ b/tools/salt-install/config_examples/single_host/multiple_hostnames/pillars/nginx_passenger.sls @@ -55,7 +55,7 @@ nginx: - add_header: 'Strict-Transport-Security "max-age=63072000" always' # OCSP stapling - # FIXME! Stapling does not work with self-signed certificates, so disabling for tests + # NOTE! Stapling does not work with self-signed certificates, so disabling for tests # - ssl_stapling: 'on' # - ssl_stapling_verify: 'on' diff --git a/tools/salt-install/config_examples/single_host/multiple_hostnames/pillars/postgresql.sls b/tools/salt-install/config_examples/single_host/multiple_hostnames/pillars/postgresql.sls index f3bc09f650..edb961ebaa 100644 --- a/tools/salt-install/config_examples/single_host/multiple_hostnames/pillars/postgresql.sls +++ b/tools/salt-install/config_examples/single_host/multiple_hostnames/pillars/postgresql.sls @@ -38,7 +38,7 @@ postgres: users: __CLUSTER___arvados: ensure: present - password: __DATABASE_PASSWORD__ + password: "__DATABASE_PASSWORD__" # tablespaces: # arvados_tablespace: diff --git a/tools/salt-install/config_examples/single_host/multiple_hostnames/states/host_entries.sls b/tools/salt-install/config_examples/single_host/multiple_hostnames/states/host_entries.sls index 379f4765cb..c2d34ea28c 100644 --- a/tools/salt-install/config_examples/single_host/multiple_hostnames/states/host_entries.sls +++ b/tools/salt-install/config_examples/single_host/multiple_hostnames/states/host_entries.sls @@ -12,7 +12,7 @@ arvados_test_salt_states_examples_single_host_etc_hosts_host_present: - ip: 127.0.1.1 - names: - {{ arvados.cluster.name }}.{{ arvados.cluster.domain }} - # FIXME! This just works for our testings. + # NOTE! This just works for our testings. # Won't work if the cluster name != host name {%- for entry in [ 'api', diff --git a/tools/salt-install/config_examples/single_host/single_hostname/pillars/nginx_passenger.sls b/tools/salt-install/config_examples/single_host/single_hostname/pillars/nginx_passenger.sls index 21c1510de8..26e2baf044 100644 --- a/tools/salt-install/config_examples/single_host/single_hostname/pillars/nginx_passenger.sls +++ b/tools/salt-install/config_examples/single_host/single_hostname/pillars/nginx_passenger.sls @@ -55,7 +55,7 @@ nginx: - add_header: 'Strict-Transport-Security "max-age=63072000" always' # OCSP stapling - # FIXME! Stapling does not work with self-signed certificates, so disabling for tests + # NOTE! Stapling does not work with self-signed certificates, so disabling for tests # - ssl_stapling: 'on' # - ssl_stapling_verify: 'on' diff --git a/tools/salt-install/config_examples/single_host/single_hostname/pillars/postgresql.sls b/tools/salt-install/config_examples/single_host/single_hostname/pillars/postgresql.sls index a69b88cb17..14452a9905 100644 --- a/tools/salt-install/config_examples/single_host/single_hostname/pillars/postgresql.sls +++ b/tools/salt-install/config_examples/single_host/single_hostname/pillars/postgresql.sls @@ -40,7 +40,7 @@ postgres: users: __CLUSTER___arvados: ensure: present - password: __DATABASE_PASSWORD__ + password: "__DATABASE_PASSWORD__" # tablespaces: # arvados_tablespace: diff --git a/tools/salt-install/config_examples/single_host/single_hostname/states/host_entries.sls b/tools/salt-install/config_examples/single_host/single_hostname/states/host_entries.sls index a688f4f8c1..51308fffa2 100644 --- a/tools/salt-install/config_examples/single_host/single_hostname/states/host_entries.sls +++ b/tools/salt-install/config_examples/single_host/single_hostname/states/host_entries.sls @@ -21,7 +21,7 @@ arvados_test_salt_states_examples_single_host_etc_hosts_host_present: - ip: 127.0.1.1 - names: - {{ arvados.cluster.name }}.{{ arvados.cluster.domain }} - # FIXME! This just works for our testing. + # NOTE! This just works for our testing. # Won't work if the cluster name != host name {%- for entry in [ 'api', diff --git a/tools/salt-install/installer.sh b/tools/salt-install/installer.sh new file mode 100755 index 0000000000..e5ff7be4e7 --- /dev/null +++ b/tools/salt-install/installer.sh @@ -0,0 +1,257 @@ +#!/bin/bash + +# Copyright (C) The Arvados Authors. All rights reserved. +# +# SPDX-License-Identifier: CC-BY-SA-3.0 + +# +# installer.sh +# +# Helps manage the configuration in a git repository, and then deploy +# nodes by pushing a copy of the git repository to each node and +# running the provision script to do the actual installation and +# configuration. +# + +set -eu + +# The parameter file +declare CONFIG_FILE=local.params + +# The salt template directory +declare CONFIG_DIR=local_config_dir + +# The 5-character Arvados cluster id +# This will be populated by loadconfig() +declare CLUSTER + +# The parent domain (not including the cluster id) +# This will be populated by loadconfig() +declare DOMAIN + +# A bash associative array listing each node and mapping to the roles +# that should be provisioned on those nodes. +# This will be populated by loadconfig() +declare -A NODES + +# The ssh user we'll use +# This will be populated by loadconfig() +declare DEPLOY_USER + +# The git repository that we'll push to on all the nodes +# This will be populated by loadconfig() +declare GITTARGET + +sync() { + local NODE=$1 + local BRANCH=$2 + + # Synchronizes the configuration by creating a git repository on + # each node, pushing our branch, and updating the checkout. + + if [[ "$NODE" != localhost ]] ; then + if ! ssh $NODE test -d ${GITTARGET}.git ; then + + # Initialize the git repository (1st time case). We're + # actually going to make two repositories here because git + # will complain if you try to push to a repository with a + # checkout. So we're going to create a "bare" repository + # and then clone a regular repository (with a checkout) + # from that. + + ssh $NODE git init --bare ${GITTARGET}.git + if ! git remote add $NODE $DEPLOY_USER@$NODE:${GITTARGET}.git ; then + git remote set-url $NODE $DEPLOY_USER@$NODE:${GITTARGET}.git + fi + git push $NODE $BRANCH + ssh $NODE git clone ${GITTARGET}.git ${GITTARGET} + fi + + # The update case. + # + # Push to the bare repository on the remote node, then in the + # remote node repository with the checkout, pull the branch + # from the bare repository. + + git push $NODE $BRANCH + ssh $NODE "git -C ${GITTARGET} checkout ${BRANCH} && git -C ${GITTARGET} pull" + fi +} + +deploynode() { + local NODE=$1 + local ROLES=$2 + + # Deploy a node. This runs the provision script on the node, with + # the appropriate roles. + + if [[ -z "$ROLES" ]] ; then + echo "No roles declared for '$NODE' in ${CONFIG_FILE}" + exit 1 + fi + + if [[ "$NODE" = localhost ]] ; then + sudo ./provision.sh --config ${CONFIG_FILE} --roles ${ROLES} + else + ssh $DEPLOY_USER@$NODE "cd ${GITTARGET} && sudo ./provision.sh --config ${CONFIG_FILE} --roles ${ROLES}" + fi +} + +loadconfig() { + if [[ ! -s $CONFIG_FILE ]] ; then + echo "Must be run from initialized setup dir, maybe you need to 'initialize' first?" + fi + source ${CONFIG_FILE} + GITTARGET=arvados-deploy-config-${CLUSTER} +} + +subcmd="$1" +if [[ -n "$subcmd" ]] ; then + shift +fi +case "$subcmd" in + initialize) + if [[ ! -f provision.sh ]] ; then + echo "Must be run from arvados/tools/salt-install" + exit + fi + + set +u + SETUPDIR=$1 + PARAMS=$2 + SLS=$3 + set -u + + err= + if [[ -z "$PARAMS" || ! -f local.params.example.$PARAMS ]] ; then + echo "Not found: local.params.example.$PARAMS" + echo "Expected one of multiple_hosts, single_host_multiple_hostnames, single_host_single_hostname" + err=1 + fi + + if [[ -z "$SLS" || ! -d config_examples/$SLS ]] ; then + echo "Not found: config_examples/$SLS" + echo "Expected one of multi_host/aws, single_host/multiple_hostnames, single_host/single_hostname" + err=1 + fi + + if [[ -z "$SETUPDIR" || -z "$PARAMS" || -z "$SLS" ]]; then + echo "installer.sh " + err=1 + fi + + if [[ -n "$err" ]] ; then + exit 1 + fi + + echo "Initializing $SETUPDIR" + git init $SETUPDIR + cp -r *.sh tests $SETUPDIR + + cp local.params.example.$PARAMS $SETUPDIR/${CONFIG_FILE} + cp -r config_examples/$SLS $SETUPDIR/${CONFIG_DIR} + + cd $SETUPDIR + git add *.sh ${CONFIG_FILE} ${CONFIG_DIR} tests + git commit -m"initial commit" + + echo "setup directory initialized, now go to $SETUPDIR, edit '${CONFIG_FILE}' and '${CONFIG_DIR}' as needed, then run 'installer.sh deploy'" + ;; + deploy) + set +u + NODE=$1 + set -u + + loadconfig + + if grep -rni 'fixme' ${CONFIG_FILE} ${CONFIG_DIR} ; then + echo + echo "Some parameters still need to be updated. Please fix them and then re-run deploy." + exit 1 + fi + + BRANCH=$(git branch --show-current) + + set -x + + git add -A + if ! git diff --cached --exit-code ; then + git commit -m"prepare for deploy" + fi + + if [[ -z "$NODE" ]]; then + for NODE in "${!NODES[@]}" + do + # First, push the git repo to each node. This also + # confirms that we have git and can log into each + # node. + sync $NODE $BRANCH + done + + for NODE in "${!NODES[@]}" + do + # Do 'database' role first, + if [[ "${NODES[$NODE]}" =~ database ]] ; then + deploynode $NODE ${NODES[$NODE]} + unset NODES[$NODE] + fi + done + + for NODE in "${!NODES[@]}" + do + # then 'api' or 'controller' roles + if [[ "${NODES[$NODE]}" =~ (api|controller) ]] ; then + deploynode $NODE ${NODES[$NODE]} + unset NODES[$NODE] + fi + done + + for NODE in "${!NODES[@]}" + do + # Everything else (we removed the nodes that we + # already deployed from the list) + deploynode $NODE ${NODES[$NODE]} + done + else + # Just deploy the node that was supplied on the command line. + sync $NODE $BRANCH + deploynode $NODE + fi + + echo + echo "Completed deploy, run 'installer.sh diagnostics' to verify the install" + + ;; + diagnostics) + loadconfig + + set +u + declare LOCATION=$1 + set -u + + if ! which arvados-client ; then + echo "arvados-client not found, install 'arvados-client' package with 'apt-get' or 'yum'" + exit 1 + fi + + if [[ -z "$LOCATION" ]] ; then + echo "Need to provide '-internal-client' or '-external-client'" + echo + echo "-internal-client You are running this on the same private network as the Arvados cluster (e.g. on one of the Arvados nodes)" + echo "-external-client You are running this outside the private network of the Arvados cluster (e.g. your workstation)" + exit 1 + fi + + export ARVADOS_API_HOST="${CLUSTER}.${DOMAIN}" + export ARVADOS_API_TOKEN="$SYSTEM_ROOT_TOKEN" + + arvados-client diagnostics $LOCATION + ;; + *) + echo "Arvados installer" + echo "" + echo "initialize initialize the setup directory for configuration" + echo "deploy deploy the configuration from the setup directory" + echo "diagnostics check your install using diagnostics" + ;; +esac diff --git a/tools/salt-install/local.params.example.multiple_hosts b/tools/salt-install/local.params.example.multiple_hosts index 31a69e9840..ade1ad4671 100644 --- a/tools/salt-install/local.params.example.multiple_hosts +++ b/tools/salt-install/local.params.example.multiple_hosts @@ -8,9 +8,26 @@ # The Arvados cluster ID, needs to be 5 lowercase alphanumeric characters. CLUSTER="cluster_fixme_or_this_wont_work" -# The domainname you want tou give to your cluster's hosts +# The domain name you want to give to your cluster's hosts +# the end result hostnames will be $SERVICE.$CLUSTER.$DOMAIN DOMAIN="domain_fixme_or_this_wont_work" +# For multi-node installs, the ssh log in for each node +# must be root or able to sudo +DEPLOY_USER=root + +# The mapping of nodes to roles +# installer.sh will log in to each of these nodes and then provision +# it for the specified roles. +NODES=( + [controller.${CLUSTER}.${DOMAIN}]=api,controller,websocket,dispatcher,keepbalance + [keep0.${CLUSTER}.${DOMAIN}]=keepstore + [keep1.${CLUSTER}.${DOMAIN}]=keepstore + [keep.${CLUSTER}.${DOMAIN}]=keepproxy,keepweb + [workbench.${CLUSTER}.${DOMAIN}]=workbench,workbench2,webshell + [shell.${CLUSTER}.${DOMAIN}]=shell +) + # Host SSL port where you want to point your browser to access Arvados # Defaults to 443 for regular runs, and to 8443 when called in Vagrant. # You can point it to another port if desired diff --git a/tools/salt-install/local.params.example.single_host_multiple_hostnames b/tools/salt-install/local.params.example.single_host_multiple_hostnames index 2ce1556511..20f334166e 100644 --- a/tools/salt-install/local.params.example.single_host_multiple_hostnames +++ b/tools/salt-install/local.params.example.single_host_multiple_hostnames @@ -11,6 +11,17 @@ CLUSTER="cluster_fixme_or_this_wont_work" # The domainname you want tou give to your cluster's hosts DOMAIN="domain_fixme_or_this_wont_work" +# For multi-node installs, the ssh log in for each node +# must be root or able to sudo +DEPLOY_USER=root + +# The mapping of nodes to roles +# installer.sh will log in to each of these nodes and then provision +# it for the specified roles. +NODES=( + [localhost]=api,controller,websocket,dispatcher,keepbalance,keepstore,keepproxy,keepweb,workbench,workbench2,webshell +) + # External ports used by the Arvados services CONTROLLER_EXT_SSL_PORT=443 KEEP_EXT_SSL_PORT=25101 diff --git a/tools/salt-install/local.params.example.single_host_single_hostname b/tools/salt-install/local.params.example.single_host_single_hostname index 7add9868d9..a684500941 100644 --- a/tools/salt-install/local.params.example.single_host_single_hostname +++ b/tools/salt-install/local.params.example.single_host_single_hostname @@ -11,6 +11,17 @@ CLUSTER="cluster_fixme_or_this_wont_work" # The domainname for your cluster's hosts DOMAIN="domain_fixme_or_this_wont_work" +# For multi-node installs, the ssh log in for each node +# must be root or able to sudo +DEPLOY_USER=root + +# The mapping of nodes to roles +# installer.sh will log in to each of these nodes and then provision +# it for the specified roles. +NODES=( + [localhost]=api,controller,websocket,dispatcher,keepbalance,keepstore,keepproxy,keepweb,workbench,workbench2,webshell +) + # Set this value when installing a cluster in a single host with a single # hostname to access all the instances. HOSTNAME_EXT should be set to the # external hostname for the instance. diff --git a/tools/salt-install/provision.sh b/tools/salt-install/provision.sh index 3c5fb41e0f..f4660be370 100755 --- a/tools/salt-install/provision.sh +++ b/tools/salt-install/provision.sh @@ -237,6 +237,8 @@ T_DIR="/tmp/cluster_tests" arguments ${@} +declare -A NODES + if [ -s ${CONFIG_FILE} ]; then source ${CONFIG_FILE} else @@ -255,7 +257,7 @@ if [ ! -d ${CONFIG_DIR} ]; then exit 1 fi -if grep -q 'fixme_or_this_wont_work' ${CONFIG_FILE} ; then +if grep -rni 'fixme' ${CONFIG_FILE} ${CONFIG_DIR} ; then echo >&2 "The config file ${CONFIG_FILE} has some parameters that need to be modified." echo >&2 "Please, fix them and re-run the provision script." exit 1