doc/admin/config-urls.html.textile.liquid

   1 ---
   2 layout: default
   3 navsection: installguide
   4 title: InternalURLs and ExternalURL
   5 ...
   6
   7 {% comment %}
   8 Copyright (C) The Arvados Authors. All rights reserved.
   9
  10 SPDX-License-Identifier: CC-BY-SA-3.0
  11 {% endcomment %}
  12
  13 The Arvados configuration is stored at @/etc/arvados/config.yml@. See the "Configuration reference":config.html for more detail.
  14
  15 The @Services@ section lists a number of Arvados services, each with an @InternalURLs@ and/or @ExternalURL@ configuration key. This document explains the precise meaning of these configuration keys, and how they are used by the Arvados services.
  16
  17 The @ExternalURL@ is the address where the service should be reachable by clients, both from inside and from outside the Arvados cluster. Some services do not expose an Arvados API, only Prometheus metrics. In that case, @ExternalURL@ is not used.
  18
  19 The keys under @InternalURLs@ are addresses that are used by the reverse proxy (e.g. Nginx) that fronts Arvados services. The exception is the @Keepstore@ service, where clients connect directly to the addresses listed under @InternalURLs@. If a service is not fronted by a reverse proxy, e.g. when its endpoint only exposes Prometheus metrics, the intention is that metrics are collected directly from the endpoints defined in @InternalURLs@.
  20
  21 @InternalURLs@ are also used by the service itself to figure out which address/port to listen on.
  22
  23 If the Arvados service lives behind a reverse proxy (e.g. Nginx), configuring the reverse proxy and the @InternalURLs@ and @ExternalURL@ values must be done in concert.
  24
  25 h2. Overview
  26
  27 <div class="offset1">
  28 table(table table-bordered table-condensed).
  29 |_.Service     |_.ExternalURL required? |_.InternalURLs required?|_.InternalURLs must be reachable from other cluster nodes?|_.Note|
  30 |railsapi       |no                     |yes|no ^1^|InternalURLs only used by Controller|
  31 |controller     |yes                    |yes|no ^2^|InternalURLs only used by reverse proxy (e.g. Nginx)|
  32 |arvados-dispatch-cloud|no              |yes|no ^3^|InternalURLs only used to expose Prometheus metrics|
  33 |arvados-dispatch-lsf|no                |yes|no ^3^|InternalURLs only used to expose Prometheus metrics|
  34 |git-http       |yes                    |yes|no ^2^|InternalURLs only used by reverse proxy (e.g. Nginx)|
  35 |git-ssh        |yes                    |no |no    ||
  36 |keepproxy      |yes                    |yes|no ^2^|InternalURLs only used by reverse proxy (e.g. Nginx)|
  37 |keepstore      |no                     |yes|yes   |All clients connect to InternalURLs|
  38 |keep-balance   |no                     |yes|no ^3^|InternalURLs only used to expose Prometheus metrics|
  39 |keep-web       |yes                    |yes|no ^2^|InternalURLs only used by reverse proxy (e.g. Nginx)|
  40 |websocket      |yes                    |yes|no ^2^|InternalURLs only used by reverse proxy (e.g. Nginx)|
  41 |workbench1     |yes                    |no|no     ||
  42 |workbench2     |yes                    |no|no     ||
  43 </div>
  44
  45 ^1^ If @Controller@ runs on a different host than @RailsAPI@, the @InternalURLs@ will need to be reachable from the host that runs @Controller@.
  46 ^2^ If the reverse proxy (e.g. Nginx) does not run on the same host as the Arvados service it fronts, the @InternalURLs@ will need to be reachable from the host that runs the reverse proxy.
  47 ^3^ If the Prometheus metrics are not collected from the same machine that runs the service, the @InternalURLs@ will need to be reachable from the host that collects the metrics.
  48
  49 When @InternalURLs@ do not need to be reachable from other nodes, it is most secure to use loopback addresses as @InternalURLs@, e.g. @http://127.0.0.1:9005@.
  50
  51 It is recommended to use a split-horizon DNS setup where the hostnames specified in @ExternalURL@ resolve to an internal IP address from inside the Arvados cluster, and a publicly routed external IP address when resolved from outside the cluster. This simplifies firewalling and provides optimally efficient traffic routing. In a cloud environment where traffic that flows via public IP addresses is charged, using split horizon DNS can also avoid unnecessary expense.
  52
  53 h2. Examples
  54
  55 The remainder of this document walks through a number of examples to provide more detail.
  56
  57 h3. Keep-balance
  58
  59 Consider this section for the @Keep-balance@ service:
  60
  61 {% codeblock as yaml %}
  62       Keepbalance:
  63         InternalURLs:
  64           "http://ip-10-0-1-233.internal:9005/": {}
  65 {% endcodeblock %}
  66
  67 @Keep-balance@ has an API endpoint, but it is only used to expose "Prometheus":https://prometheus.io metrics.
  68
  69 There is no @ExternalURL@ key because @Keep-balance@ does not have an Arvados API, no Arvados services need to connect to @Keep-balance@.
  70
  71 The value for @InternalURLs@ tells the @Keep-balance@ service to start up and listen on port 9005, if it is started on a host where @ip-10-0-1-233.internal@ resolves to a local IP address. If @Keep-balance@ is started on a machine where the @ip-10-0-1-233.internal@ hostname does not resolve to a local IP address, it would refuse to start up, because it would not be able to find a local IP address to listen on.
  72
  73 It is also possible to use IP addresses in @InternalURLs@, for example:
  74
  75 {% codeblock as yaml %}
  76       Keepbalance:
  77         InternalURLs:
  78           "http://127.0.0.1:9005/": {}
  79 {% endcodeblock %}
  80
  81 In this example, @Keep-balance@ would start up and listen on port 9005 at the @127.0.0.1@ IP address. Prometheus would only be able to access the @Keep-balance@ metrics if it could reach that IP and port, e.g. if it runs on the same machine.
  82
  83 Finally, it is also possible to listen on all interfaces, for example:
  84
  85 {% codeblock as yaml %}
  86       Keepbalance:
  87         InternalURLs:
  88           "http://0.0.0.0:9005/": {}
  89 {% endcodeblock %}
  90
  91 In this case, @Keep-balance@ will listen on port 9005 on all IP addresses local to the machine.
  92
  93 h3. Keepstore
  94
  95 Consider this section for the @Keepstore@ service:
  96
  97 {% codeblock as yaml %}
  98       Keepstore:
  99         InternalURLs:
 100           "http://keep0.ClusterID.example.com:25107": {}
 101           "http://keep1.ClusterID.example.com:25107": {}
 102 {% endcodeblock %}
 103
 104 There is no @ExternalURL@ key because @Keepstore@ is only accessed from inside the Arvados cluster. For access from outside, all traffic goes via @Keepproxy@.
 105
 106 When @Keepstore@ is installed on the host where @keep0.ClusterID.example.com@ resolves to a local IP address, it will listen on port 25107 on that IP address. Likewise on the @keep1.ClusterID.example.com@ host. On all other systems, @Keepstore@ will refuse to start.
 107
 108 h3. Keepproxy
 109
 110 Consider this section for the @Keepproxy@ service:
 111
 112 {% codeblock as yaml %}
 113       Keepproxy:
 114         ExternalURL: https://keep.ClusterID.example.com
 115         InternalURLs:
 116           "http://localhost:25107": {}
 117 {% endcodeblock %}
 118
 119 The @ExternalURL@ advertised is @https://keep.ClusterID.example.com@. The @Keepproxy@ service will start up on @localhost@ port 25107, however. This is possible because we also configure Nginx to terminate SSL and sit in front of the @Keepproxy@ service:
 120
 121 <notextile><pre><code>upstream keepproxy {
 122   server                127.0.0.1:<span class="userinput">25107</span>;
 123 }
 124
 125 server {
 126   listen                  443 ssl;
 127   server_name             <span class="userinput">keep.ClusterID.example.com</span>;
 128
 129   proxy_connect_timeout   90s;
 130   proxy_read_timeout      300s;
 131   proxy_set_header        X-Real-IP $remote_addr;
 132   proxy_http_version      1.1;
 133   proxy_request_buffering off;
 134   proxy_max_temp_file_size 0;
 135
 136   ssl_certificate     <span class="userinput">/YOUR/PATH/TO/cert.pem</span>;
 137   ssl_certificate_key <span class="userinput">/YOUR/PATH/TO/cert.key</span>;
 138
 139   # Clients need to be able to upload blocks of data up to 64MiB in size.
 140   client_max_body_size    64m;
 141
 142   location / {
 143     proxy_pass            http://keepproxy;
 144   }
 145 }
 146 </code></pre></notextile>
 147
 148 If a client connects to the @Keepproxy@ service, it will talk to Nginx which will reverse proxy the traffic to the @Keepproxy@ service.
 149
 150 h3. Workbench
 151
 152 Consider this section for the @Workbench@ service:
 153
 154 {% codeblock as yaml %}
 155   Workbench1:
 156     ExternalURL: "https://workbench.ClusterID.example.com"
 157 {% endcodeblock %}
 158
 159 The @ExternalURL@ advertised is @https://workbench.ClusterID.example.com@. There is no value for @InternalURLs@ because Workbench1 is a Rails application served by Passenger. The only client connecting to the Passenger process is the reverse proxy (e.g. Nginx), and the listening host/post is configured in its configuration:
 160
 161 <notextile><pre><code>
 162 server {
 163   listen       443 ssl;
 164   server_name  workbench.ClusterID.example.com;
 165
 166   ssl_certificate     /YOUR/PATH/TO/cert.pem;
 167   ssl_certificate_key /YOUR/PATH/TO/cert.key;
 168
 169   root /var/www/arvados-workbench/current/public;
 170   index  index.html;
 171
 172   passenger_enabled on;
 173   # If you're using RVM, uncomment the line below.
 174   #passenger_ruby /usr/local/rvm/wrappers/default/ruby;
 175
 176   # `client_max_body_size` should match the corresponding setting in
 177   # the API.MaxRequestSize and Controller's server's Nginx configuration.
 178   client_max_body_size 128m;
 179 }
 180 </code></pre></notextile>
 181
 182 h3. API server
 183
 184 Consider this section for the @RailsAPI@ service:
 185
 186 {% codeblock as yaml %}
 187       RailsAPI:
 188         InternalURLs:
 189           "http://localhost:8004": {}
 190 {% endcodeblock %}
 191
 192 There is no @ExternalURL@ defined because the @RailsAPI@ is not directly accessible and does not need to advertise a URL: all traffic to it flows via @Controller@, which is the only client that talks to it.
 193
 194 The @RailsAPI@ service is also a Rails application, and its listening host/port is defined in the Nginx configuration:
 195
 196 <notextile><pre><code>
 197 server {
 198   # This configures the Arvados API server.  It is written using Ruby
 199   # on Rails and uses the Passenger application server.
 200
 201   listen localhost:8004;
 202   server_name localhost-api;
 203
 204   root /var/www/arvados-api/current/public;
 205   index  index.html index.htm index.php;
 206
 207   passenger_enabled on;
 208
 209   # If you are using RVM, uncomment the line below.
 210   # If you're using system ruby, leave it commented out.
 211   #passenger_ruby /usr/local/rvm/wrappers/default/ruby;
 212
 213   # This value effectively limits the size of API objects users can
 214   # create, especially collections.  If you change this, you should
 215   # also ensure the following settings match it:
 216   # * `client_max_body_size` in the previous server section
 217   # * `API.MaxRequestSize` in config.yml
 218   client_max_body_size 128m;
 219 }
 220 </code></pre></notextile>
 221
 222 So then, why is there a need to specify @InternalURLs@ for the @RailsAPI@ service? It is there because this is how the @Controller@ service locates the @RailsAPI@ service it should talk to. Since this connection is internal to the Arvados cluster, @Controller@ uses @InternalURLs@ to find the @RailsAPI@ endpoint.
 223
 224 h3. Controller
 225
 226 Consider this section for the @Controller@ service:
 227
 228 {% codeblock as yaml %}
 229   Controller:
 230     InternalURLs:
 231       "http://localhost:8003": {}
 232     ExternalURL: "https://ClusterID.example.com"
 233 {% endcodeblock %}
 234
 235 The @ExternalURL@ advertised is @https://ClusterID.example.com@. The @Controller@ service will start up on @localhost@ port 8003. Nginx is configured to sit in front of the @Controller@ service and terminates SSL:
 236
 237 <notextile><pre><code>
 238 # This is the port where nginx expects to contact arvados-controller.
 239 upstream controller {
 240   server     localhost:8003  fail_timeout=10s;
 241 }
 242
 243 server {
 244   # This configures the public https port that clients will actually connect to,
 245   # the request is reverse proxied to the upstream 'controller'
 246
 247   listen       443 ssl;
 248   server_name  ClusterID.example.com;
 249
 250   ssl_certificate     /YOUR/PATH/TO/cert.pem;
 251   ssl_certificate_key /YOUR/PATH/TO/cert.key;
 252
 253   # Refer to the comment about this setting in the passenger (arvados
 254   # api server) section of your Nginx configuration.
 255   client_max_body_size 128m;
 256
 257   location / {
 258     proxy_pass               http://controller;
 259     proxy_redirect           off;
 260     proxy_connect_timeout    90s;
 261     proxy_read_timeout       300s;
 262     proxy_max_temp_file_size 0;
 263     proxy_request_buffering  off;
 264     proxy_buffering          off;
 265     proxy_http_version       1.1;
 266
 267     proxy_set_header      Host              $http_host;
 268     proxy_set_header      Upgrade           $http_upgrade;
 269     proxy_set_header      Connection        "upgrade";
 270     proxy_set_header      X-External-Client $external_client;
 271     proxy_set_header      X-Forwarded-For   $proxy_add_x_forwarded_for;
 272     proxy_set_header      X-Forwarded-Proto https;
 273     proxy_set_header      X-Real-IP         $remote_addr;
 274   }
 275 }
 276 </code></pre></notextile>
 277
 278