h2. Arados /etc/arvados/config.yml
-The configuration file is normally found at @/etc/arvados/config.yml@ and will be referred to as just @config.yml@ in this guide. This configuration file should be kept in sync across every node in the cluster, except compute nodes (which usually do not require config.yml). We recommend using a devops configuration management tool such as "Puppet":https://puppet.com/open-source/ to synchronize the config file.
+The configuration file is normally found at @/etc/arvados/config.yml@ and will be referred to as just @config.yml@ in this guide. This configuration file must be kept in sync across every service node in the cluster, but not shell and compute nodes (which do not require config.yml).
h3. Syntax
h3(#empty). Create empty configuration file
+Change @webserver-user@ to the user that runs your web server process. This is @www-data@ on Debian-based systems, and @nginx@ on Red Hat-based systems.
+
<notextile>
<pre><code># <span class="userinput">export ClusterID=xxxxx</span>
+# <span class="userinput">umask 027</span>
# <span class="userinput">mkdir -p /etc/arvados</span>
# <span class="userinput">cat > /etc/arvados/config.yml <<EOF
Clusters:
${ClusterID}:
-EOF</span></code></pre>
+EOF</span>
+# <span class="userinput">chgrp webserver-user /etc/arvados /etc/arvados/config.yml</span>
+</span></code></pre>
</notextile>
h2. Nginx configuration
This guide will also cover setting up "Nginx":https://www.nginx.com/ as a reverse proxy for Arvados services. Nginx performs two main functions: TLS termination and virtual host routing. The virtual host configuration for each component will go in its own file in @/etc/nginx/conf.d/@.
+
+h2. Synchronizing config file
+
+The Arvados configuration file must be kept in sync across every service node in the cluster. We strongly recommend using a devops configuration management tool such as "Puppet":https://puppet.com/open-source/ to synchronize the config file. Alternately, something like the following script to securely copy the configuration file to each node may be helpful. Replace the @ssh@ targets with your nodes.
+
+<notextile>
+<pre><code>#!/bin/sh
+sudo cat /etc/arvados/config.yml | ssh <span class="userinput">10.0.0.2</span> sudo sh -c "'cat > /etc/arvados/config.yml'"
+sudo cat /etc/arvados/config.yml | ssh <span class="userinput">10.0.0.3</span> sudo sh -c "'cat > /etc/arvados/config.yml'"
+</code></pre>
+</notextile>
<notextile>
<pre><code>~$ <span class="userinput">azure config mode arm</span>
-~$ <span class="userinput">azure login</span>
-~$ <span class="userinput">azure group create exampleGroupName eastus</span>
-~$ <span class="userinput">azure storage account create --type LRS --location eastus --resource-group exampleGroupName exampleStorageAccountName</span>
-~$ <span class="userinput">azure storage account keys list --resource-group exampleGroupName exampleStorageAccountName</span>
-info: Executing command storage account keys list
-+ Getting storage account keys
-data: Primary: zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz==
-data: Secondary: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy==
-info: storage account keys list command OK
+~$ <span class="userinput">az login</span>
+~$ <span class="userinput">az group create exampleGroupName eastus2</span>
+~$ <span class="userinput">az storage account create --sku Standard_LRS --kind BlobStorage --encryption-services blob --access-tier Hot --https-only true --location eastus2 --resource-group exampleGroupName --name exampleStorageAccountName</span>
+~$ <span class="userinput">az storage account keys list --resource-group exampleGroupName --account-name exampleStorageAccountName
+[
+ {
+ "keyName": "key1",
+ "permissions": "Full",
+ "value": "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz=="
+ },
+ {
+ "keyName": "key2",
+ "permissions": "Full",
+ "value": "yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy=="
+ }
+]</span>
~$ <span class="userinput">AZURE_STORAGE_ACCOUNT="exampleStorageAccountName" \
AZURE_STORAGE_ACCESS_KEY="zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz==" \
-azure storage container create exampleContainerName</span>
+azure storage container create --name exampleContainerName</span>
</code></pre>
</notextile>
In order to use Google for authentication, you must use the <a href="https://console.developers.google.com" target="_blank">Google Developers Console</a> to create a set of client credentials.
# Go to the <a href="https://console.developers.google.com" target="_blank">Google Developers Console</a> and select or create a project; this will take you to the project page.
-# On the sidebar, click on *APIs & auth* then select *APIs*.
-## Search for *Contacts API* and click on *Enable API*.
-## Search for *Google+ API* and click on *Enable API*.
-# On the sidebar, click on *Credentials*; under *OAuth* click on *Create new Client ID* to bring up the *Create Client ID* dialog box.
+# Click on *+ Enable APIs and Services*
+## Search for *People API* and click on *Enable API*.
+# Navigate back to the main "APIs & Services" page
+# On the sidebar, click on *OAuth consent screen*
+## On consent screen settings, enter your identifying details
+## Under *Authorized domains* add @yourdomain.com@
+## Click on *Save*.
+# On the sidebar, click on *Credentials*; then click on *Create credentials*→*OAuth Client ID*
# Under *Application type* select *Web application*.
-# If the authorization origins are not displayed, clicking on *Create Client ID* will take you to *Consent screen* settings.
-## On consent screen settings, enter the appropriate details and click on *Save*.
-## This will return you to the *Create Client ID* dialog box.
# You must set the authorization origins. Edit @auth.your.domain@ to the appropriate hostname that you will use to access the SSO service:
-## JavaScript origin should be @https://auth.example.com/@
-## Redirect URI should be @https://auth.example.com/users/auth/google_oauth2/callback@
+## JavaScript origin should be @https://ClusterID.yourdomain.com/@ (using Arvados-controller based login) or @https://auth.yourdomain.com/@ (for the SSO server)
+## Redirect URI should be @https://ClusterID.yourdomain.com/login@ (using Arvados-controller based login) or @https://auth.yourdomain.com/users/auth/google_oauth2/callback@ (for the SSO server)
# Copy the values of *Client ID* and *Client secret* from the Google Developers Console and add them to the appropriate configuration.
Controller:
ExternalURL: <span class="userinput">"https://xxxxx.example.com"</span>
InternalURLs:
- <span class="userinput">"http://xxxxx.example.com:8003": {}</span>
+ <span class="userinput">"http://localhost:8003": {}</span>
RailsAPI:
# Does not have an ExternalURL
InternalURLs:
- <span class="userinput">"http://xxxxx.example.com:8004": {}</span>
+ <span class="userinput">"http://localhost:8004": {}</span>
</code></pre>
</notextile>
# "available keep services" request with either a list of internal keep
# servers (0) or with the keepproxy (1).
#
-# TODO: Following the example here, update the netmask to the
-# your internal subnet.
+# TODO: Following the example here, update the 10.20.30.0/24 netmask
+# to match your private subnet.
+# TODO: Update 1.2.3.4 and add lines as necessary with the public IP
+# address of all servers that can also access the private network to
+# ensure they are not considered 'external'.
geo $external_client {
default 1;
+ 127.0.0.0/24 0;
<span class="userinput">10.20.30.0/24</span> 0;
+ <span class="userinput">1.2.3.4/32</span> 0;
}
# This is the port where nginx expects to contact arvados-controller.
upstream controller {
- server xxxxx.example.com:8003 fail_timeout=10s;
+ server localhost:8003 fail_timeout=10s;
}
server {
# This configures the public https port that clients will actually connect to,
# the request is reverse proxied to the upstream 'controller'
- listen <span class="userinput">xxxxx.example.com</span>:443 ssl;
+ listen *:443 ssl;
server_name <span class="userinput">xxxxx.example.com</span>;
ssl on;
# This configures the Arvados API server. It is written using Ruby
# on Rails and uses the Passenger application server.
- listen <span class="userinput">xxxxx.example.com:8004</span>;
+ listen <span class="userinput">localhost:8004</span>;
server_name localhost-api;
root /var/www/arvados-api/current/public;
{% include 'install_packages' %}
+{% assign arvados_component = 'arvados-controller' %}
+
{% include 'start_service' %}
h2(#confirm-working). Confirm working installation
h3. Troubleshooting
-See the admin page on "Logging":{{site.baseurl}}/admin/logging.html .
+If you are getting TLS errors, make sure the @ssl_certificate@ directive in your nginx configuration has the "full certificate chain":http://nginx.org/en/docs/http/configuring_https_servers.html#chains
+
+Logs can be found in @/var/www/arvados-api/current/log/production.log@ and using @journalctl -u arvados-controller@.
+
+See also the admin page on "Logging":{{site.baseurl}}/admin/logging.html .
If @WebDAVDownload@ is blank, and @WebDAV@ has a single origin (not wildcard, see below), then Workbench will show an error page
-<pre>
- Services:
+<notextile>
+<pre><code> Services:
WebDAVDownload:
- ExternalURL: https://download.ClusterID.example.com
-</pre>
+ ExternalURL: <span class="userinput">https://download.ClusterID.example.com</span>
+</code></pre>
+</notextile>
h3. Collections preview URL
Collections can be served from their own subdomain:
-<pre>
- Services:
+<notextile>
+<pre><code> Services:
WebDAV:
- ExternalURL: https://*.collections.ClusterID.example.com
-</pre>
+ ExternalURL: <span class="userinput">https://*.collections.ClusterID.example.com</span>
+</code></pre>
+</notextile>
h4. Under the main domain
Alternately, they can go under the main domain by including @--@:
-<pre>
- Services:
+<notextile>
+<pre><code> Services:
WebDAV:
- ExternalURL: https://*--collections.ClusterID.example.com
-</pre>
+ ExternalURL: <span class="userinput">https://*--collections.ClusterID.example.com</span>
+</code></pre>
+</notextile>
h4. From a single domain
Serve preview links from a single domain, setting uuid or pdh in the path (similar to downloads). This configuration only allows previews of public data or collection-sharing links, because these use the anonymous user token or the token is already embedded in the URL. Authenticated requests will always result in file downloads from @Services.WebDAVDownload.ExternalURL@.
-<pre>
- Services:
+<notextile>
+<pre><code> Services:
WebDAV:
- ExternalURL: https://collections.ClusterID.example.com
-</pre>
+ ExternalURL: <span class="userinput">https://collections.ClusterID.example.com</span>
+</code></pre>
+</notextile>
+
+h2. Set InternalURLs
+
+<notextile>
+<pre><code> Services:
+ WebDAV:
+ InternalURLs:
+ <span class="userinput">"http://localhost:9002"</span>: {}
+</code></pre>
+</notextile>
h2(#update-config). Configure anonymous user token
<notextile>
<pre><code> Users:
- AnonymousUserToken: "{{railsout}}"
+ AnonymousUserToken: <span class="userinput">"{{railsout}}"</span>
</code></pre>
</notextile>
Set @Users.AnonymousUserToken: ""@ (empty string) or leave it out if you do not want to serve public data.
-h2. Set InternalURL
-
-<pre>
- Services:
- WebDAV:
- InternalURL:
- "http://collections.ClusterID.example.com:9002": {}
-</pre>
-
h3. Update nginx configuration
Put a reverse proxy with SSL support in front of keep-web. Keep-web itself runs on the port 25107 (or whatever is specified in @Services.Keepproxy.InternalURL@) the reverse proxy runs on port 443 and forwards requests to Keepproxy.
}
server {
- listen <span class="userinput">[TODO: your public IP address]</span>:443 ssl;
+ listen *:443 ssl;
server_name download.<span class="userinput">ClusterID</span>.example.com
collections.<span class="userinput">ClusterID</span>.example.com
*.collections.<span class="userinput">ClusterID</span>.example.com
h2(#confirm-working). Confirm working installation
-Adjust for your configuration.
-
<pre>
$ curl -H "Authorization: Bearer $system_root_token" https://download.ClusterID.example.com/c=59389a8f9ee9d399be35462a0f92541c-53/_/hello.txt
</pre>
+If wildcard collections domains are configured:
+
<pre>
-$ curl -H "Authorization: Bearer $system_root_token" https://collections.ClusterID.example.com/c=59389a8f9ee9d399be35462a0f92541c-53/_/hello.txt
+$ curl -H "Authorization: Bearer $system_root_token" https://59389a8f9ee9d399be35462a0f92541c-53.collections.ClusterID.example.com/hello.txt
</pre>
+If using a single collections preview domain:
+
<pre>
-$ curl -H "Authorization: Bearer $system_root_token" https://59389a8f9ee9d399be35462a0f92541c-53.collections.ClusterID.example.com/hello.txt
+$ curl https://collections.ClusterID.example.com/c=59389a8f9ee9d399be35462a0f92541c-53/t=$system_root_token/_/hello.txt
</pre>
Keepproxy:
ExternalURL: <span class="userinput">https://keep.ClusterID.example.com</span>
InternalURLs:
- <span class="userinput">"http://keep.ClusterID.example.com:25107": {}</span>
+ <span class="userinput">"http://localhost:25107": {}</span>
</span></code></pre>
</notextile>
}
server {
- listen <span class="userinput">[TODO your public IP address]</span>:443 ssl;
+ listen *:443 ssl;
server_name keep.<span class="userinput">ClusterID</span>.example.com;
proxy_connect_timeout 90s;
Log into a host that is on a network external to your private Arvados network. The host should be able to contact your keepproxy server (eg @keep.ClusterID.example.com@), but not your keepstore servers (eg keep[0-9].ClusterID.example.com).
+@ARVADOS_API_HOST@ and @ARVADOS_API_TOKEN@ must be set in the environment.
+
+@ARVADOS_API_HOST@ should be the hostname of the API server.
+
+@ARVADOS_API_TOKEN@ should be the system root token.
+
+Install the "Command line SDK":{{site.baseurl}}/sdk/cli/install.html
+
+Check that the keepproxy server is in the @keep_service@ "accessible" list:
+
+<notextile>
+<pre><code>
+$ <span class="userinput">arv keep_service accessible</span>
+[...]
+</code></pre>
+</notextile>
+
+If keepstore does not show up in the "accessible" list, and you are accessing it from within the private network, check that you have "properly configured the @geo@ block for the API server":install-api-server.html#update-nginx .
+
Install the "Python SDK":{{site.baseurl}}/sdk/python/sdk-python.html
-@ARVADOS_API_HOST@ and @ARVADOS_API_TOKEN@ must be set in the environment.
+You should now be able to use @arv-put@ to upload collections and @arv-get@ to fetch collections. Be sure to execute this from _outside_ the cluster's private network.
-You should now be able to use @arv-put@ to upload collections and @arv-get@ to fetch collections, for an example see "Testing keep.":install-keepstore.html#testing on the keepstore install page.
+{% include 'arv_put_example' %}
# "Install keepstore package":#install-packages
# "Restart the API server and controller":#restart-api
# "Confirm working installation":#confirm-working
+# "Note on storage management":#note
h2. Introduction
InternalURLs:
"http://<span class="userinput">keep0.ClusterID.example.com</span>:25107/": {}
"http://<span class="userinput">keep1.ClusterID.example.com</span>:25107/": {}
- # and so forth
+ # and so forth
</code></pre>
</notextile>
h2(#confirm-working). Confirm working installation
-Install the "Python SDK":{{site.baseurl}}/sdk/python/sdk-python.html
+Log into a host that is on your private Arvados network. The host should be able to contact your your keepstore servers (eg keep[0-9].ClusterID.example.com).
@ARVADOS_API_HOST@ and @ARVADOS_API_TOKEN@ must be set in the environment.
-You should now be able to use @arv-put@ to upload collections and @arv-get@ to fetch collections:
+@ARVADOS_API_HOST@ should be the hostname of the API server.
-<pre>
-$ echo "hello world!" > hello.txt
+@ARVADOS_API_TOKEN@ should be the system root token.
-$ arv-put --portable-data-hash hello.txt
-2018-07-12 13:35:25 arvados.arv_put[28702] INFO: Creating new cache file at /home/example/.cache/arvados/arv-put/1571ec0adb397c6a18d5c74cc95b3a2a
-0M / 0M 100.0% 2018-07-12 13:35:27 arvados.arv_put[28702] INFO:
+Install the "Command line SDK":{{site.baseurl}}/sdk/cli/install.html
-2018-07-12 13:35:27 arvados.arv_put[28702] INFO: Collection saved as 'Saved at 2018-07-12 17:35:25 UTC by example@example'
-59389a8f9ee9d399be35462a0f92541c+53
+Check that the keepstore server is in the @keep_service@ "accessible" list:
-$ arv-get 59389a8f9ee9d399be35462a0f92541c+53/hello.txt
-hello world!
-</pre>
+<notextile>
+<pre><code>
+$ <span class="userinput">arv keep_service accessible</span>
+[...]
+</code></pre>
+</notextile>
+
+If keepstore does not show up in the "accessible" list, and you are accessing it from within the private network, check that you have "properly configured the @geo@ block for the API server":install-api-server.html#update-nginx .
+
+Next, install the "Python SDK":{{site.baseurl}}/sdk/python/sdk-python.html
+
+You should now be able to use @arv-put@ to upload collections and @arv-get@ to fetch collections. Be sure to execute this from _inside_ the cluster's private network. You will be able to access keep from _outside_ the private network after setting up "keepproxy":install-keepproxy.html .
+
+{% include 'arv_put_example' %}
-h3. Note on storage management
+h2(#note). Note on storage management
On its own, a keepstore server never deletes data. Instead, the keep-balance service determines which blocks are candidates for deletion and instructs the keepstore to move those blocks to the trash. Please see the "Balancing Keep servers":{{site.baseurl}}/admin/keep-balance.html for more details.
<notextile><pre># <span class="userinput">postgresql-setup initdb</span></pre></notextile>
# Configure the database to accept password connections
<notextile><pre><code># <span class="userinput">sed -ri -e 's/^(host +all +all +(127\.0\.0\.1\/32|::1\/128) +)ident$/\1md5/' /var/lib/pgsql/data/pg_hba.conf</span></code></pre></notextile>
-# Configure the database to launch at boot
- <notextile><pre># <span class="userinput">systemctl enable rh-postgresql95-postgresql</span></pre></notextile>
-# Start the database
- <notextile><pre># <span class="userinput">systemctl start rh-postgresql95-postgresql</span></pre></notextile>
+# Configure the database to launch at boot and start now
+ <notextile><pre># <span class="userinput">systemctl enable --now rh-postgresql95-postgresql</span></pre></notextile>
h3(#debian). Debian or Ubuntu
# Install PostgreSQL
<notextile><pre># <span class="userinput">apt-get --no-install-recommends install postgresql postgresql-contrib</span></pre></notextile>
-# Configure the database to launch at boot
- <notextile><pre># <span class="userinput">systemctl enable postgresql</span></pre></notextile>
-# Start PostgreSQL
- <notextile><pre># <span class="userinput">systemctl start postgresql</span></pre></notextile>
+# Configure the database to launch at boot and start now
+ <notextile><pre># <span class="userinput">systemctl enable --now postgresql</span></pre></notextile>
<notextile>
<pre><code>server {
- listen <span class="userinput">[your public IP address]</span>:443 ssl;
+ listen *:443 ssl;
server_name workbench.<span class="userinput">ClusterID.example.com</span>;
ssl on;
h2(#sso). Option 2: Separate single-sign-on (SSO) server (supports Google, LDAP, local database)
-See "Install the Single Sign On (SSO) server":#install-sso.html
+See "Install the Single Sign On (SSO) server":install-sso.html
# "Install Ruby and Bundler":../../install/ruby.html
# "Install the Python SDK":../python/sdk-python.html
-h2. Option 1: Install distribution package
-
-First, configure the "Arvados package repositories":../../install/packages.html
-
-{% assign arvados_component = 'arvados-cli' %}
-
-{% include 'install_packages' %}
-
-h2. Option 2: Install from RubyGems
+h2. Install from RubyGems
<notextile>
<pre>