From: Peter Amstutz Date: Mon, 23 Jul 2018 15:29:28 +0000 (-0400) Subject: 13791: Admin documentation for monitoring wip X-Git-Tag: 1.2.0~55^2~2 X-Git-Url: https://git.arvados.org/arvados.git/commitdiff_plain/0a3d7a02236cbec448203a1b2218b5e0630d1c00?ds=sidebyside 13791: Admin documentation for monitoring wip Arvados-DCO-1.1-Signed-off-by: Peter Amstutz --- diff --git a/doc/_config.yml b/doc/_config.yml index 3cf6fb377a..56da6fa9a2 100644 --- a/doc/_config.yml +++ b/doc/_config.yml @@ -157,6 +157,7 @@ navbar: - admin/migrating-providers.html.textile.liquid - user/topics/arvados-sync-groups.html.textile.liquid - Monitoring: + - admin/management-token.html.textile.liquid - admin/health-checks.html.textile.liquid - admin/metrics.html.textile.liquid - Cloud: diff --git a/doc/admin/health-checks.html.textile.liquid b/doc/admin/health-checks.html.textile.liquid index 64ce5ee493..9370c6ce68 100644 --- a/doc/admin/health-checks.html.textile.liquid +++ b/doc/admin/health-checks.html.textile.liquid @@ -10,46 +10,25 @@ Copyright (C) The Arvados Authors. All rights reserved. SPDX-License-Identifier: CC-BY-SA-3.0 {% endcomment %} -Arvados services support endpoints for monitoring the status of a cluster. +Health check endpoints are found at @/_health/ping@ on many Arvados services. The purpose of the health check is to be a simple method of determining if a service can be contacted and if it believes it is functioning properly, suitable for integrating into operational alert systems. -Health check endpoints are found at @/_health/ping@ for many Arvados services. +Health check endpoints must be configured with a "management token":management-token.html . -Services must have ManagementToken configured. This is used to authorize access to the health check endpoint. If ManagementToken is not configured, health checks will return the error @404 disabled@. - -The requester must provide the HTTP header @Authorization: Bearer (ManagementToken)@. - -This endpoint returns a JSON object with the field @health@. This has a value of either @OK@ or @ERROR@. On error, it may also include a field @error@ with additional information. - -h2. How to enable health checks on each service. - -h3. API server - -Set @MangementToken@ in @application.yml@ +This endpoint returns a JSON object with the field @health@. This has a value of either @OK@ or @ERROR@. On error, it may also include a field @error@ with additional information. Examples:
-  # Token to be included in all healthcheck requests. Disabled by default.
-  # Server expects request header of the format "Authorization: Bearer xxx"
-  ManagementToken: ...
+{
+  "health": "OK"
+}
 
-h3. Node Manager - -Set @port@ (the listen port) and @MangementToken@ in the @Manage@ section of @node-manager.ini@ . -
-[Manage]
-port=8888
-ManagementToken=...
+{
+  "health": "ERROR"
+  "error": "Inverted polarity of the warp core"
+}
 
- -* -* keepstore -* keep-web -* keepproxy -* arv-git-httpd -* websockets - h2. Healthcheck aggregator The service @arvados-health@ performs health checks on all configured services and returns a single value of @OK@ or @ERROR@ for the entire cluster. It exposes the endpoint @/_health/all@ . diff --git a/doc/admin/management-token.html.textile.liquid b/doc/admin/management-token.html.textile.liquid new file mode 100644 index 0000000000..33027ad887 --- /dev/null +++ b/doc/admin/management-token.html.textile.liquid @@ -0,0 +1,47 @@ +--- +layout: default +navsection: admin +title: Management token +... + +{% comment %} +Copyright (C) The Arvados Authors. All rights reserved. + +SPDX-License-Identifier: CC-BY-SA-3.0 +{% endcomment %} + +To enable and collect health checks and metrics, services must be configured with a "management token". + +Services must have ManagementToken configured. This is used to authorize access monitoring endpoints. If ManagementToken is not configured, monitoring endpoints will return the error @404 disabled@. + +To access a monitoring endpoint, the requester must provide the HTTP header @Authorization: Bearer (ManagementToken)@. + +h2. API server + +Set @MangementToken@ in @application.yml@ + +
+  # Token to be included in all healthcheck requests. Disabled by default.
+  # Server expects request header of the format "Authorization: Bearer xxx"
+  ManagementToken: ...
+
+ +h2. Node Manager + +Set @port@ (the listen port) and @MangementToken@ in the @Manage@ section of @node-manager.ini@ . + +
+[Manage]
+port=8888
+ManagementToken=...
+
+ +h2. Other services + +The following services also support health check. Set @MangementToken@ in the respective yaml config file for each service. + +* keepstore +* keep-web +* keepproxy +* arv-git-httpd +* websockets diff --git a/doc/admin/metrics.html.textile.liquid b/doc/admin/metrics.html.textile.liquid index fb33ccbd9e..107431267e 100644 --- a/doc/admin/metrics.html.textile.liquid +++ b/doc/admin/metrics.html.textile.liquid @@ -10,4 +10,82 @@ Copyright (C) The Arvados Authors. All rights reserved. SPDX-License-Identifier: CC-BY-SA-3.0 {% endcomment %} -Arvados services support endpoints for monitoring the performance of a cluster. +Metrics endpoints are found at @/status.json@ on many Arvados services. The purpose of metrics are to provide statistics about the operation of a service, suitable for diagnosing how well a service is performing under load. + +Metrics endpoints must be configured with a "management token":management-token.html . + +h2. Keepstore + +h3. Root + +table(table table-bordered table-condensed). +|_. Attribute|_. Type|_. Description| +|Volumes| array of "volumeStatusEnt":#volumeStatusEnt || +|BufferPool| "PoolStatus":#PoolStatus || +|PullQueue| "WorkQueueStatus":#WorkQueueStatus || +|TrashQueue| "WorkQueueStatus":#WorkQueueStatus || +|RequestsCurrent| int || +|RequestsMax| int || +|Version| string || + +h3(#volumeStatusEnt). volumeStatusEnt + +table(table table-bordered table-condensed). +|_. Attribute|_. Type|_. Description| +|Label| string|| +|Status| "VolumeStatus":#VolumeStatus || +|VolumeStats| "ioStats":#ioStats || + +h3(#VolumeStatus). VolumeStatus + +table(table table-bordered table-condensed). +|_. Attribute|_. Type|_. Description| +|MountPoint| string|| +|DeviceNum| uint64|| +|BytesFree| uint64|| +|BytesUsed| uint64|| + +h3(#ioStats). ioStats + +table(table table-bordered table-condensed). +|_. Attribute|_. Type|_. Description| +|Errors| uint64|| +|Ops| uint64|| +|CompareOps| uint64|| +|GetOps| uint64|| +|PutOps| uint64|| +|TouchOps| uint64|| +|InBytes| uint64|| +|OutBytes| uint64|| + +h3(#PoolStatus). PoolStatus + +table(table table-bordered table-condensed). +|_. Attribute|_. Type|_. Description| +|BytesAllocatedCumulative| uint64|| +|BuffersMax| int|| +|BuffersInUse| int|| + +h3(#WorkQueueStatus). WorkQueueStatus + +table(table table-bordered table-condensed). +|_. Attribute|_. Type|_. Description| +|InProgress| int|| +|Queued| int|| + +h2. Node manager + +The node manager status end point provides a snapshot of internal status at the time of the most recent wishlist update. + +table(table table-bordered table-condensed). +|_. Attribute|_. Type|_. Description| +|nodes_booting|int|Number of nodes in booting state| +|nodes_unpaired|int|Number of nodes in unpaired state| +|nodes_busy|int|Number of nodes in busy state| +|nodes_idle|int|Number of nodes in idle state| +|nodes_fail|int|Number of nodes in fail state| +|nodes_down|int|Number of nodes in down state| +|nodes_shutdown|int|Number of nodes in shutdown state| +|nodes_wish|int|Number of nodes in the current wishlist| +|node_quota|int|Current node count ceiling due to cloud quota limits| +|config_max_nodes|int|Configured max node count| diff --git a/doc/install/cheat_sheet.html.textile.liquid b/doc/install/cheat_sheet.html.textile.liquid index afff1f4542..562b76ddf0 100644 --- a/doc/install/cheat_sheet.html.textile.liquid +++ b/doc/install/cheat_sheet.html.textile.liquid @@ -1,7 +1,7 @@ --- layout: default navsection: admin -title: User management +title: User management at the CLI ... {% comment %} Copyright (C) The Arvados Authors. All rights reserved. diff --git a/services/api/db/structure.sql b/services/api/db/structure.sql index a201a05aaf..f1f57f51d9 100644 --- a/services/api/db/structure.sql +++ b/services/api/db/structure.sql @@ -3116,6 +3116,7 @@ INSERT INTO schema_migrations (version) VALUES ('20180501182859'); INSERT INTO schema_migrations (version) VALUES ('20180514135529'); +INSERT INTO schema_migrations (version) VALUES ('20180607175050'); + INSERT INTO schema_migrations (version) VALUES ('20180608123145'); -INSERT INTO schema_migrations (version) VALUES ('20180607175050');