X-Git-Url: https://git.arvados.org/arvados.git/blobdiff_plain/0a3d7a02236cbec448203a1b2218b5e0630d1c00..0f0562320c97412d12a5c7d7f1c89b807c9d0325:/doc/admin/health-checks.html.textile.liquid?ds=sidebyside diff --git a/doc/admin/health-checks.html.textile.liquid b/doc/admin/health-checks.html.textile.liquid index 9370c6ce68..fa273cd204 100644 --- a/doc/admin/health-checks.html.textile.liquid +++ b/doc/admin/health-checks.html.textile.liquid @@ -10,11 +10,11 @@ Copyright (C) The Arvados Authors. All rights reserved. SPDX-License-Identifier: CC-BY-SA-3.0 {% endcomment %} -Health check endpoints are found at @/_health/ping@ on many Arvados services. The purpose of the health check is to be a simple method of determining if a service can be contacted and if it believes it is functioning properly, suitable for integrating into operational alert systems. +Health check endpoints are found at @/_health/ping@ on many Arvados services. The purpose of the health check is to offer a simple method of determining if a service can be reached and allow the service to self-report any problems, suitable for integrating into operational alert systems. -Health check endpoints must be configured with a "management token":management-token.html . +To access health check endpoints, services must be configured with a "management token":management-token.html . -This endpoint returns a JSON object with the field @health@. This has a value of either @OK@ or @ERROR@. On error, it may also include a field @error@ with additional information. Examples: +Health check endpoints return a JSON object with the field @health@. This has a value of either @OK@ or @ERROR@. On error, it may also include a field @error@ with additional information. Examples:
 {
@@ -25,46 +25,47 @@ This endpoint returns a JSON object with the field @health@.  This has a value o
 
 {
   "health": "ERROR"
-  "error": "Inverted polarity of the warp core"
+  "error": "Inverted polarity in the warp core"
 }
 
-h2. Healthcheck aggregator +h2. Health check aggregator The service @arvados-health@ performs health checks on all configured services and returns a single value of @OK@ or @ERROR@ for the entire cluster. It exposes the endpoint @/_health/all@ . -The healthcheck aggregator uses the "NodeProfile" section of the cluster-wide configuration file. Here is an example. +The healthcheck aggregator uses the @Services@ section of the cluster-wide @config.yml@ configuration file. -
-Cluster:
-  # The cluster uuid prefix
-  zzzzz:
-    NodeProfile:
-      # For each node, the profile name corresponds to a
-      # locally-resolvable hostname, and describes which Arvados
-      # services are available on that machine.
-      api:
-        arvados-controller:
-          Listen: 8000
-        arvados-api-server:
-          Listen: 8001
-      manage:
-	arvados-node-manager:
-	  Listen: 8002
-      workbench:
-	arvados-workbench:
-	  Listen: 8003
-	arvados-ws:
-	  Listen: 8004
-      keep:
-	keep-web:
-	  Listen: 8005
-	keepproxy:
-	  Listen: 8006
-      keep0:
-        keepstore:
-	  Listen: 25701
-      keep1:
-        keepstore:
-	  Listen: 25701
-
+h2. Health check command + +The @arvados-server check@ command is another way to perform the same health checks as the health check aggregator service. It does not depend on the aggregator service. + +If all checks pass, it writes @health check OK@ to stderr (unless the @-quiet@ flag is used) and exits 0. Otherwise, it writes error messages to stderr and exits with error status. + +@arvados-server check -yaml@ outputs a YAML document on stdout with additional details about each service endpoint that was checked. + +{% codeblock as yaml %} +Checks: + "arvados-api-server+http://localhost:8004/_health/ping": + ClockTime: "2022-11-16T16:08:57Z" + ConfigSourceSHA256: e2c086ae3dd290cf029cb3fe79146529622279b6280cf6cd17dc8d8c30daa57f + ConfigSourceTimestamp: "2022-11-07T18:08:24.539545Z" + HTTPStatusCode: 200 + Health: OK + Response: + health: OK + ResponseTime: 0.017159 + Server: nginx/1.14.0 + Phusion Passenger(R) 6.0.15 + Version: 2.5.0~dev20221116141533 + "arvados-controller+http://localhost:8003/_health/ping": + ClockTime: "2022-11-16T16:08:57Z" + ConfigSourceSHA256: e2c086ae3dd290cf029cb3fe79146529622279b6280cf6cd17dc8d8c30daa57f + ConfigSourceTimestamp: "2022-11-07T18:08:24.539545Z" + HTTPStatusCode: 200 + Health: OK + Response: + health: OK + ResponseTime: 0.004748 + Server: "" + Version: 2.5.0~dev20221116141533 (go1.18.8) +# ... +{% endcodeblock %}