SPDX-License-Identifier: CC-BY-SA-3.0
{% endcomment %}
-Arvados services support endpoints for monitoring the status of a cluster.
+Health check endpoints are found at @/_health/ping@ on many Arvados services. The purpose of the health check is to offer a simple method of determining if a service can be reached and allow the service to self-report any problems, suitable for integrating into operational alert systems.
-Health check endpoints are found at @/_health/ping@ for many Arvados services.
+To access health check endpoints, services must be configured with a "management token":management-token.html .
-Services must have ManagementToken configured. This is used to authorize access to the health check endpoint. If ManagementToken is not configured, health checks will return the error @404 disabled@.
-
-The requester must provide the HTTP header @Authorization: Bearer (ManagementToken)@.
-
-This endpoint returns a JSON object with the field @health@. This has a value of either @OK@ or @ERROR@. On error, it may also include a field @error@ with additional information.
-
-h2. How to enable health checks on each service.
-
-h3. API server
-
-Set @MangementToken@ in @application.yml@
+Health check endpoints return a JSON object with the field @health@. This has a value of either @OK@ or @ERROR@. On error, it may also include a field @error@ with additional information. Examples:
<pre>
- # Token to be included in all healthcheck requests. Disabled by default.
- # Server expects request header of the format "Authorization: Bearer xxx"
- ManagementToken: ...
+{
+ "health": "OK"
+}
</pre>
-h3. Node Manager
-
-Set @port@ (the listen port) and @MangementToken@ in the @Manage@ section of @node-manager.ini@ .
-
<pre>
-[Manage]
-port=8888
-ManagementToken=...
+{
+ "health": "ERROR"
+ "error": "Inverted polarity in the warp core"
+}
</pre>
-
-*
-* keepstore
-* keep-web
-* keepproxy
-* arv-git-httpd
-* websockets
-
h2. Healthcheck aggregator
The service @arvados-health@ performs health checks on all configured services and returns a single value of @OK@ or @ERROR@ for the entire cluster. It exposes the endpoint @/_health/all@ .
-The healthcheck aggregator uses the "NodeProfile" section of the cluster-wide configuration file. Here is an example.
+The healthcheck aggregator uses the @NodeProfile@ section of the cluster-wide @arvados.yml@ configuration file. Here is an example.
<pre>
Cluster:
# The cluster uuid prefix
zzzzz:
+ ManagementToken: xyzzy
NodeProfile:
# For each node, the profile name corresponds to a
# locally-resolvable hostname, and describes which Arvados
# services are available on that machine.
api:
arvados-controller:
- Listen: 8000
+ Listen: :8000
arvados-api-server:
- Listen: 8001
+ Listen: :8001
manage:
arvados-node-manager:
- Listen: 8002
+ Listen: :8002
workbench:
arvados-workbench:
- Listen: 8003
+ Listen: :8003
arvados-ws:
- Listen: 8004
+ Listen: :8004
keep:
keep-web:
- Listen: 8005
+ Listen: :8005
keepproxy:
- Listen: 8006
+ Listen: :8006
+ keep-balance:
+ Listen: :9005
keep0:
keepstore:
- Listen: 25701
+ Listen: :25107
keep1:
keepstore:
- Listen: 25701
+ Listen: :25107
</pre>