From: Ward Vandewege Date: Fri, 22 May 2015 14:03:01 +0000 (-0400) Subject: Add installation instructions for compute nodes; update the installation X-Git-Tag: 1.1.0~1592 X-Git-Url: https://git.arvados.org/arvados.git/commitdiff_plain/3bf22ae161deadd56d20c4165d2ec569de2dcdef Add installation instructions for compute nodes; update the installation instructions for crunch dispatcher. No issue # --- diff --git a/doc/_config.yml b/doc/_config.yml index e4dc782d31..1e68f08126 100644 --- a/doc/_config.yml +++ b/doc/_config.yml @@ -152,6 +152,7 @@ navbar: - install/install-keepproxy.html.textile.liquid - install/install-arv-git-httpd.html.textile.liquid - install/install-crunch-dispatch.html.textile.liquid + - install/install-compute-node.html.textile.liquid - install/cheat_sheet.html.textile.liquid - Software prerequisites: - install/install-manual-prerequisites-ruby.html.textile.liquid diff --git a/doc/install/install-compute-node.html.textile.liquid b/doc/install/install-compute-node.html.textile.liquid new file mode 100644 index 0000000000..dd64cd03a0 --- /dev/null +++ b/doc/install/install-compute-node.html.textile.liquid @@ -0,0 +1,103 @@ +--- +layout: default +navsection: installguide +title: Install a compute node +... + +This installation guide assumes you are on a 64 bit Debian or Ubuntu system. + +h2. Install dependencies + +First add the Arvados apt repository, and then install a number of packages. + + +
~$ echo "deb http://apt.arvados.org/ wheezy main" | sudo tee /etc/apt/sources.list.d/apt.arvados.org.list
+~$ sudo /usr/bin/apt-key adv --keyserver pool.sks-keyservers.net --recv 1078ECD7
+~$ sudo /usr/bin/apt-get update
+~$ sudo /usr/bin/apt-get install python-pip python-pyvcf python-gflags python-google-api-python-client python-virtualenv libattr1-dev libfuse-dev python-dev python-llfuse fuse crunchstat python-arvados-fuse iptables ca-certificates lxc apt-transport-https docker.io
+
+
+ +h2. Install slurm and munge + + +
~$ sudo /usr/bin/apt-get install slurm-llnl munge
+
+
+ +h2. Copy configuration files from the dispatcher (api) + +The @/etc/slurm-llnl/slurm.conf@ and @/etc/munge/munge.key@ files need to be identicaly across the dispatcher and all compute nodes. Copy the files you created in the "Install the Crunch dispatcher":{{site.baseurl}} step to this compute node. + +h2. Crunch user account + +* @adduser crunch@ + +The crunch user should have the same UID, GID, and home directory on all compute nodes and on the dispatcher (api server). + +h2. Configure fuse + +Install this file as @/etc/fuse.conf@: + + +
+# Set the maximum number of FUSE mounts allowed to non-root users.
+# The default is 1000.
+#
+#mount_max = 1000
+
+# Allow non-root users to specify the 'allow_other' or 'allow_root'
+# mount options.
+#
+user_allow_other
+
+
+ +h2. Tell the API server about this compute node + +Load your API superuser token on the compute node: + + +

+~$ HISTIGNORE=$HISTIGNORE:'export ARVADOS_API_TOKEN=*'
+~$ export ARVADOS_API_TOKEN=@your-superuser-token@
+~$ export ARVADOS_API_HOST=@uuid_prefix.your.domain@
+~$ unset ARVADOS_API_HOST_INSECURE
+
+
+
+ +Then execute this script to create a compute node object, and set up a cron job to have the compute node ping the API server every five minutes: + + +

+#!/bin/bash
+if ! test -f /root/node.json ; then
+    arv node create --node "{\"hostname\": \"$(hostname)\"}" > /root/node.json
+
+    # Make sure /dev/fuse permissions are correct (the device appears after fuse is loaded)
+    chmod 1660 /dev/fuse && chgrp fuse /dev/fuse
+fi
+
+UUID=`grep \"uuid\" /root/node.json  |cut -f4 -d\"`
+PING_SECRET=`grep \"ping_secret\" /root/node.json  |cut -f4 -d\"`
+
+if ! test -f /etc/cron.d/node_ping ; then
+    echo "*/5 * * * * root /usr/bin/curl -k -d ping_secret=$PING_SECRET https://api/arvados/v1/nodes/$UUID/ping" > /etc/cron.d/node_ping
+fi
+
+/usr/bin/curl -k -d ping_secret=$PING_SECRET https://api/arvados/v1/nodes/$UUID/ping?ping_secret=$PING_SECRET
+
+
+
+ +And remove your token from the environment: + + +

+~$ unset ARVADOS_API_TOKEN
+~$ unset ARVADOS_API_HOST
+
+
+
+ diff --git a/doc/install/install-crunch-dispatch.html.textile.liquid b/doc/install/install-crunch-dispatch.html.textile.liquid index 231d1f45e8..4a695ca199 100644 --- a/doc/install/install-crunch-dispatch.html.textile.liquid +++ b/doc/install/install-crunch-dispatch.html.textile.liquid @@ -21,19 +21,77 @@ Install the Python SDK and CLI tools on controller and all compute nodes. * See "Python SDK":{{site.baseurl}}/sdk/python/sdk-python.html page for details. -h4. Likely crunch job dependencies +h4. Slurm -On compute nodes: +On the API server, install slurm and munge, and generate a munge key: -* @pip install --upgrade pyvcf@ + +
~$ sudo /usr/bin/apt-get install slurm-llnl munge
+~$ sudo /usr/sbin/create-munge-key
+
+
-h4. Crunch user account +Now we need to give slurm a configuration file in @/etc/slurm-llnl/slurm.conf@. Here's an example: + + +
+ControlMachine=uuid_prefix.your.domain
+SlurmctldPort=6817
+SlurmdPort=6818
+AuthType=auth/munge
+StateSaveLocation=/tmp
+SlurmdSpoolDir=/tmp/slurmd
+SwitchType=switch/none
+MpiDefault=none
+SlurmctldPidFile=/var/run/slurmctld.pid
+SlurmdPidFile=/var/run/slurmd.pid
+ProctrackType=proctrack/pgid
+CacheGroups=0
+ReturnToService=2
+TaskPlugin=task/affinity
+#
+# TIMERS
+SlurmctldTimeout=300
+SlurmdTimeout=300
+InactiveLimit=0
+MinJobAge=300
+KillWait=30
+Waittime=0
+#
+# SCHEDULING
+SchedulerType=sched/backfill
+SchedulerPort=7321
+SelectType=select/cons_res
+SelectTypeParameters=CR_CPU_Memory
+FastSchedule=1
+#
+# LOGGING
+SlurmctldDebug=3
+#SlurmctldLogFile=
+SlurmdDebug=3
+#SlurmdLogFile=
+JobCompType=jobcomp/none
+#JobCompLoc=
+JobAcctGatherType=jobacct_gather/none
+#
+# COMPUTE NODES
+NodeName=DEFAULT
+PartitionName=DEFAULT MaxTime=INFINITE State=UP
+PartitionName=compute Default=YES Shared=yes
+
+NodeName=compute[0-255]
+
+PartitionName=compute Nodes=compute[0-255]
+
+
+ +Please make sure to update the value of the @ControlMachine@ parameter to the hostname of your dispatcher (api server). -On compute nodes and controller: +h4. Crunch user account * @adduser crunch@ -The crunch user should have the same UID, GID, and home directory on all compute nodes and on the controller. +The crunch user should have the same UID, GID, and home directory on all compute nodes and on the dispatcher (api server). h4. Repositories