*.pyc
docker/*/generated/*
docker/config.yml
-doc/_site/*
doc/.site/*
-doc/sdk/python/arvados
\ No newline at end of file
+doc/sdk/python/arvados
+sdk/perl/MYMETA.*
+sdk/perl/Makefile
+sdk/perl/blib/*
+sdk/perl/pm_to_blib
def activity
@breadcrumb_page_name = nil
- @users = User.all
+ @users = User.limit(params[:limit] || 1000).all
@user_activity = {}
@activity = {
logins: {},
@@client_mtx = Mutex.new
@@api_client = nil
- @@profiling_enabled = Rails.configuration.profiling_enabled rescue false
+ @@profiling_enabled = Rails.configuration.profiling_enabled
def api(resources_kind, action, data=nil)
profile_checkpoint
+++ /dev/null
-$arvados_api_client = ArvadosApiClient.new
+# This file must be loaded _after_ secret_token.rb if secret_token is
+# defined there instead of in config/application.yml.
+
$application_config = {}
%w(application.default application).each do |cfgfile|
--- /dev/null
+# The client object must be instantiated _after_ zza_load_config.rb
+# runs, because it relies on configuration settings.
+#
+if not $application_config
+ raise "Fatal: Config must be loaded before instantiating ArvadosApiClient."
+end
+
+$arvados_api_client = ArvadosApiClient.new
Additional information is available on the "'Documentation' page on the Arvados wiki":https://arvados.org/projects/arvados/wiki/Documentation.
-h2. 0. Install dependencies
+h2. Install dependencies
<pre>
arvados/doc$ bundle install
</pre>
-h2. 1. Generate HTML pages
+h2. Generate HTML pages
<pre>
arvados/doc$ rake
arvados/doc$ rake generate baseurl=$PWD/.site
</pre>
-h2. 2. Preview HTML pages
+h2. Run linkchecker
+
+If you have "Linkchecker":http://wummel.github.io/linkchecker/ installed on
+your system, you can run it against the documentation:
+
+<pre>
+arvados/doc$ rake linkchecker baseurl=file://$PWD/.site
+</pre>
+
+Please note that this will regenerate your $PWD/.site directory.
+
+h2. Preview HTML pages
<pre>
arvados/doc$ rake run
Preview the rendered pages at "http://localhost:8000":http://localhost:8000.
-h2. 3. Publish HTML pages inside Workbench
+h2. Publish HTML pages inside Workbench
(or some other web site)
arvados/doc$ ln -sn ../../../doc/.site ../apps/workbench/public/doc
</pre>
-h2. 4. Delete generated files
+h2. Delete generated files
<pre>
arvados/doc$ rake realclean
require "rubygems"
require "colorize"
-task :generate do
+task :generate => [ :realclean, 'sdk/python/arvados/index.html' ] do
vars = ['baseurl', 'arvados_api_host', 'arvados_workbench_host']
vars.each do |v|
if ENV[v]
end
end
-require "zenweb/tasks"
-load "zenweb-textile.rb"
-load "zenweb-liquid.rb"
-
file "sdk/python/arvados/index.html" do |t|
`which epydoc`
if $? == 0
- `epydoc --html -o sdk/python/arvados arvados`
- Dir["sdk/python/arvados/*"].each do |f|
- puts f
- $website.pages[f] = Zenweb::Page.new($website, f)
- end
+ `epydoc --html --parse-only -o sdk/python/arvados ../sdk/python/arvados/`
else
puts "Warning: epydoc not found, Python documentation will not be generated".colorize(:light_red)
end
end
+task :linkchecker => [ :generate ] do
+ Dir.chdir(".site") do
+ `which linkchecker`
+ if $? == 0
+ system "linkchecker index.html --ignore-url='!file://'"
+ else
+ puts "Warning: linkchecker not found, skipping run".colorize(:light_red)
+ end
+ end
+end
+
+task :clean do
+ rm_rf "sdk/python/arvados"
+end
+
+require "zenweb/tasks"
+load "zenweb-textile.rb"
+load "zenweb-liquid.rb"
+
task :extra_wirings do
$website.pages["sdk/python/python.html.textile.liquid"].depends_on("sdk/python/arvados/index.html")
end
A user (person) is permitted to act on an object if there is a path (series of permission Links) from the acting user to the object in which
-* Every intervening object is a Group, and
+* Every intervening object is a Group or a User, and
* Every intervening permission Link allows the current action
Each object has exactly one _owner_, which can be either a User or a Group.
Three lab members are working together on a project. All Specimens, Links, Jobs, etc. can be modified by any of the three lab members. _Other_ lab members, who are not working on this project, can view but not modify these objects.
-h3. 4. Segregated roles
+h3. 4. Group-level administrator
+
+The Ashton Lab administrator, Alison, manages user accounts within her lab. She can enable and disable accounts, and exercise any permission that her lab members have.
+
+George has read-only access to the same set of accounts. This lets him see things like user activity and resource usage reports, without worrying about accidentally messing up anyone's data.
+
+table(table table-bordered table-condensed).
+|Tail |Permission |Head |Effect|
+|Group: Ashton Lab Admin|can_manage |User: Lab Member 1 |Lab member 1 is in this administrative group|
+|Group: Ashton Lab Admin|can_manage |User: Lab Member 2 |Lab member 2 is in this administrative group|
+|Group: Ashton Lab Admin|can_manage |User: Lab Member 3 |Lab member 3 is in this administrative group|
+|Group: Ashton Lab Admin|can_manage |User: Alison |Alison is in this administrative group|
+|Group: Ashton Lab Admin|can_manage |User: George |George is in this administrative group|
+|Alison |can_manage |Group: Ashton Lab Admin |Alison can do everything the above lab members can do|
+|George |can_read |Group: Ashton Lab Admin |George can read everything the above lab members can read|
+
+h3. 5. Segregated roles
Granwyth, at the Hulatberi Lab, sets up a Factory Robot which uses a hosted Arvados site to do work for the Hulatberi Lab.
"name":"GATK / exome PE fastq to snp",
"components":{
"extract-reference":{
+ "repository":"arvados",
+ "script_version":"e820bd1c6890f93ea1a84ffd5730bbf0e3d8e153",
"script":"file-select",
"script_parameters":{
"names":[
],
"input":"d237a90bae3870b3b033aea1e99de4a9+10820+K@qr1hi"
},
- "script_version":"e820bd1c6890f93ea1a84ffd5730bbf0e3d8e153",
"output_is_persistent":false
},
"bwa-index":{
+ "repository":"arvados",
"script_version":"e820bd1c6890f93ea1a84ffd5730bbf0e3d8e153",
"script":"bwa-index",
"script_parameters":{
"output_is_persistent":false
},
"bwa-aln":{
+ "repository":"arvados",
"script_version":"e820bd1c6890f93ea1a84ffd5730bbf0e3d8e153",
"script":"bwa-aln",
"script_parameters":{
"output_is_persistent":false
},
"picard-gatk2-prep":{
+ "repository":"arvados",
"script_version":"e820bd1c6890f93ea1a84ffd5730bbf0e3d8e153",
"script":"picard-gatk2-prep",
"script_parameters":{
"output_is_persistent":false
},
"GATK2-realign":{
+ "repository":"arvados",
"script_version":"e820bd1c6890f93ea1a84ffd5730bbf0e3d8e153",
"script":"GATK2-realign",
"script_parameters":{
"output_is_persistent":false
},
"GATK2-bqsr":{
+ "repository":"arvados",
"script_version":"e820bd1c6890f93ea1a84ffd5730bbf0e3d8e153",
"script":"GATK2-bqsr",
"script_parameters":{
"output_is_persistent":false
},
"GATK2-merge-call":{
+ "repository":"arvados",
"script_version":"e820bd1c6890f93ea1a84ffd5730bbf0e3d8e153",
"script":"GATK2-merge-call",
"script_parameters":{
{% include 'notebox_end' %}
notextile. <pre><code>$ <span class="userinput">sudo gem install arvados arvados-cli</span></code></pre>
+
+h3. Perl
+
+{% include 'notebox_begin' %}
+The Perl client library includes the @Arvados.pm@ module and submodules.
+{% include 'notebox_end' %}
+
+<notextile>
+<pre><code>$ <span class="userinput">cd arvados/sdk/perl</span>
+$ <span class="userinput">perl Makefile.PL</span>
+$ <span class="userinput">sudo make install</span>
+</code></pre>
+</notextile>
+
title: Accessing an Arvados VM over ssh
...
-Arvados requires a public @ssh@ key in order to securely log in to an Arvados VM instance, or to access an Arvados @git@ repository.
+Arvados requires a public ssh key in order to securely log in to an Arvados VM instance, or to access an Arvados @git@ repository.
This document is divided up into three sections.
notextile. <pre><code>$ <span class="userinput">ls ~/.ssh/id_rsa.pub</span></code></pre>
-If the file @id_rsa.pub@ exists, then you may use your existing key. Copy the contents of @~/.ssh/id_rsa.pub@ onto the clipboard (this is your public key). Proceed to "adding your key to the Arvados Workbench.":#workbench
+If the file @id_rsa.pub@ exists, then you may use your existing key. Copy the contents of @~/.ssh/id_rsa.pub@ onto the clipboard (this is your public key). You can skip this step and proceed by "adding your key to the Arvados Workbench.":#workbench
If there is no file @~/.ssh/id_rsa.pub@, you must generate a new key. Use @ssh-keygen@ to do this:
</code></pre>
</notextile>
-Now you can set up @ssh-agent@ (next) or proceed to "adding your key to the Arvados Workbench.":#workbench
+Now you can set up @ssh-agent@ (next) or proceed with "adding your key to the Arvados Workbench.":#workbench
h3. Setting up ssh-agent (recommended)
notextile. <pre><code>$ <span class="userinput">eval $(ssh-agent -s)</span></code></pre>
-* @ssh-agent -s@ prints out values for environment variables SSH_AUTH_SOCK and SSH_AGENT_PID and then runs in the background. Using "eval" on the output as shown here causes those variables to be set in the current shell environment so that subsequent calls to @ssh@ can discover how to access the @ssh-agent@ daemon.
+@ssh-agent -s@ prints out values for environment variables SSH_AUTH_SOCK and SSH_AGENT_PID and then runs in the background. Using "eval" on the output as shown here causes those variables to be set in the current shell environment so that subsequent calls to @ssh@ can discover how to access the @ssh-agent@ daemon.
-After running @ssh-agent@, or if @ssh-add -l@ prints "The agent has no identities", then you will need to add your key using the following command. The passphrase to decrypt the key is the same used to protect the key when it was created with @ssh-keygen@:
+After running @ssh-agent@, or if @ssh-add -l@ prints "The agent has no identities", then you will need to add your key using the following command. The passphrase to decrypt the key is the same used to protect the key when it was created with @ssh-keygen@:
<notextile>
<pre><code>$ <span class="userinput">ssh-add</span>
(Note: if you are using the @ssh@ client that comes with "Cygwin":http://cygwin.com you should follow the "Unix":#unix instructions).
-"PuTTY":http://www.putty.org/ is a free (MIT-licensed) Win32 Telnet and SSH client. PuTTy includes all the tools a windows user needs to set up Private Keys and to set up and use SSH connections to your virtual machines in the Arvados Cloud.
+"PuTTY":http://www.chiark.greenend.org.uk/~sgtatham/putty/ is a free (MIT-licensed) Win32 Telnet and SSH client. PuTTY includes all the tools a Windows user needs to create private keys and make ssh connections to your virtual machines in the Arvados Cloud.
-You can use PuTTY to create public/private keys, which are how you’ll ensure that that access to Arvados cloud is secure. You can also use PuTTY as an SSH client to access your virtual machine in an Arvados cloud and work with the Arvados Command Line Interface (CLI) client.
-
-You may download putty from "http://www.putty.org/":http://www.putty.org/ .
-
-Note that you should download the installer or .zip file with all of the PuTTY tools (PuTTYtel is not required).
+You can "download PuTTY from its Web site":http://www.chiark.greenend.org.uk/~sgtatham/putty/. Note that you should download the installer or .zip file with all of the PuTTY tools (PuTTYtel is not required).
h3. Step 1 - Adding PuTTY to the PATH
# Open the Control Panel.
# Select _Advanced System Settings_, and choose _Environment Variables_.
# Under system variables, find and edit @PATH@.
-# Add the following to the end of PATH (make sure to include semi colon and quotation marks):
+# If you installed PuTTY in @C:\Program Files\PuTTY\@, add the following to the end of PATH (make sure to include semicolon and quotation marks):
+<code>;\"C:\Program Files\PuTTY\"</code>
+If you installed PuTTY in @C:\Program Files (x86)\PuTTY\@, add the following to the end of PATH (make sure to include semicolon and quotation marks):
<code>;\"C:\Program Files (x86)\PuTTY\"</code>
# Click through the OKs to close all the dialogs you’ve opened.
# At the bottom of the window, make sure the ‘Number of bits in a generated key’ field is set to 4096.
# Click Generate and follow the instructions to generate a key.
# Click to save the Public Key.
-# Click to save the Private Key (we recommend using a strong passphrase) .
+# Click to save the Private Key (we recommend using a strong passphrase).
# Select the text of the Public Key and copy it to the clipboard.
h3. Step 3 - Set up Pageant
-Note: Pageant is a PuTTY utility that manages your private keys so is not necessary to enter your private key passphrase every time you need to make a new ssh connection.
+Pageant is a PuTTY utility that manages your private keys so is not necessary to enter your private key passphrase every time you make a new ssh connection.
# Start Pageant from the Start Menu or the folder where it was installed.
# Pageant will now be running in the system tray. Click the Pageant icon to configure.
# Choose _Add Key_ and add the private key which you created in the previous step.
-You are now ready to proceed to "adding your key to the Arvados Workbench":#workbench .
-
-_Note: We recommend you do not delete the “Default” Saved Session._
+You are now ready to proceed to "adding your key to the Arvados Workbench.":#workbench
h1(#workbench). Adding your key to Arvados Workbench
-h3. From the workbench dashboard
+h3. From the Workbench dashboard
-If you have no @ssh@ keys registered, there should be a notification asking you to provide your @ssh@ public key. On the Workbench dashboard (in this guide, this is "https://{{ site.arvados_workbench_host }}/":https://{{ site.arvados_workbench_host }}/ ), look for the envelope icon <span class="glyphicon glyphicon-envelope"></span> <span class="badge badge-alert">1</span> in upper right corner (the number indicates there are new notifications). Click on this icon and a dropdown menu should appear with a message asking you to add your public key. Paste your public key into the text area provided and click on the check button to submit the key. You are now ready to "log into an Arvados VM":#login.
+If you have no ssh keys registered, there should be a notification asking you to provide your ssh public key. On the Workbench dashboard, look for the envelope icon <span class="glyphicon glyphicon-envelope"></span> <span class="badge badge-alert">1</span> in upper right corner (the number indicates there are new notifications). Click on this icon and a dropdown menu should appear with a message asking you to add your public key. Paste your public key into the text area provided and click on the check button to submit the key. You are now ready to "log into an Arvados VM":#login.
h3. Alternate way to add ssh keys
-If you want to add additional @ssh@ keys, click on the user icon <span class="glyphicon glyphicon-user"></span> in the upper right corner to access the user settings menu, and click on the menu item _Manage ssh keys_ to go to the Authorized keys page.
+If you want to add additional ssh keys, click on the user icon <span class="glyphicon glyphicon-user"></span> in the upper right corner to access the user settings menu, and click on the menu item *Manage ssh keys* to go to the Authorized keys page.
-On _Authorized keys_ page, the click on the button <span class="btn btn-primary disabled">Add a new authorized key</span> in the upper right corner.
+On the *Authorized keys* page, the click on the button <span class="btn btn-primary disabled">Add a new authorized key</span> in the upper right corner.
-The page will reload with a new row of information. Under the *public_key* column heading, click on the cell +none+ . This will open an editing popup as shown in this screenshot:
+The page will reload with a new row of information. Under the *public_key* column heading, click on the cell +none+. This will open an editing popup as shown in this screenshot:
!{{ site.baseurl }}/images/ssh-adding-public-key.png!
-Paste the public key from the previous section into the popup text box and click on the check mark to save it. This should refresh the page with the public key that you just added now listed under the *public_key* column. You are now ready to "log into an Arvados VM":#login.
+Paste the public key that you copied to the cliboard in the previous section into the popup text box, then click on the check mark to save it. This should refresh the page with the public key that you just added now listed under the *public_key* column. You are now ready to "log into an Arvados VM":#login.
h1(#login). Using ssh to log into an Arvados VM
-To see a list of virtual machines that you have access to and determine the name and login information, click on Compute %(rarr)→% Virtual machines. Once on the "virtual machines" page, The *hostname* columns lists the name of each available VM. The *logins* column will have a value in the form of @["you"]@. Ignore the square brackets and quotes to get your login name. In this guide the hostname will be _shell_ and the login will be _you_. Replace these with your hostname and login as appropriate.
+To see a list of virtual machines that you have access to and determine the name and login information, click on Compute %(rarr)→% Virtual machines. Once on the *Virtual machines* page, The *hostname* columns lists the name of each available VM. The *logins* column will have a value in the form of @["you"]@. Your login name is the text inside the quotes. In this guide the hostname will be _shell_ and the login will be _you_. Replace these with your hostname and login name as appropriate.
This section consists of two sets of instructions, depending on whether you will be logging in using a "Unix":#unixvm (Linux, OS X, Cygwin) or "Windows":#windowsvm client.
h2(#unixvm). Logging in using command line ssh (Unix)
-h3. Connecting to the VM
+h3. Connecting to the virtual machine
-Use the following command to connect to the "shell" VM instance as "you". Replace *<code>you@shell</code>* at the end of the following command with your *login* and *hostname* from Workbench:
+Use the following command to connect to the _shell_ VM instance as _you_. Replace *<code>you@shell</code>* at the end of the following command with your *login* and *hostname* from Workbench:
-notextile. <pre><code>$ <span class="userinput">ssh -o "ProxyCommand ssh -a -x -p2222 turnout@switchyard.{{ site.arvados_api_host }} shell" -A -x <b>you@shell</b></span></code></pre>
+notextile. <pre><code>$ <span class="userinput">ssh -o "ProxyCommand ssh -a -x -p2222 turnout@switchyard.{{ site.arvados_api_host }} <b>shell</b>" -A -x <b>you@shell</b></span></code></pre>
-There are several things going on here:
+This command does several things at once. You usually cannot log in directly to virtual machines over the public Internet. Instead, you log into a "switchyard" server and then tell the switchyard which virtual machine you want to connect to.
-The VMs typically have addresses that are not globally routable, so you cannot log in directly. Instead, you log into a "switchyard" server and then tell the switchyard which VM you want to connect to.
-
-* @-o "ProxyCommand ..."@ option instructs ssh to run the specified command and then tunnel your ssh connection over the proxy.
-* @-a@ tells ssh not to forward your ssh-agent credentials to the switchyard
-* @-x@ tells ssh not to forward your X session to the switchyard
-* @-p2222@ specifies that the switchyard is running on non-standard port 2222
-* <code>turnout@switchyard.{{ site.arvados_api_host }}</code> specifies the user (@turnout@) and hostname (@switchyard.{{ site.arvados_api_host }}@) of the switchboard server that will proxy our connection to the VM.
-* @shell@ is the name of the VM that we want to connect to. This is sent to the switchyard server as if it were an ssh command, and the switchyard server connects to the VM on our behalf.
-* After the ProxyCommand section, the @-x@ must be repeated because it applies to the connection to VM instead of the switchyard.
+* @-o "ProxyCommand ..."@ configures ssh to run the specified command to create a proxy and route your connection through it.
+* @-a@ tells ssh not to forward your ssh-agent credentials to the switchyard.
+* @-x@ tells ssh not to forward your X session to the switchyard.
+* @-p2222@ specifies that the switchyard is running on non-standard port 2222.
+* <code>turnout@switchyard.{{ site.arvados_api_host }}</code> specifies the user (@turnout@) and hostname (@switchyard.{{ site.arvados_api_host }}@) of the switchyard server that will proxy our connection to the VM.
+* *@shell@* is the name of the VM that we want to connect to. This is sent to the switchyard server as if it were an ssh command, and the switchyard server connects to the VM on our behalf.
+* After the ProxyCommand section, we repeat @-x@ to disable X session forwarding to the virtual machine.
* @-A@ specifies that we want to forward access to @ssh-agent@ to the VM.
-* Finally, *<code>you@shell</code>* specifies your username and repeats the hostname of the VM. The username can be found in the *logins* column in the VMs Workbench page, discussed above.
+* Finally, *<code>you@shell</code>* specifies your login name and repeats the hostname of the VM. The username can be found in the *logins* column in the VMs Workbench page, discussed in the previous section.
You should now be able to log into the Arvados VM and "check your environment.":check-environment.html
h3. Configuration (recommended)
-Since the above command line is cumbersome, it can be greatly simplfied by adding the following section your @~/.ssh/config@ file:
+The command line above is cumbersome, but you can configure ssh to remember many of these settings. Add this text to the file @.ssh/config@ in your home directory (create a new file if @.ssh/config@ doesn't exist):
<notextile>
<pre><code class="userinput">Host *.arvados
# Open PuTTY from the Start Menu.
# On the Session screen set the Host Name (or IP address) to “shell”.
# On the Session screen set the Port to “22”.
-# On the Connection %(rarr)→% Data screen set the Auto-login username to the username listed in the *logins* column on the Arvados Workbench _Access %(rarr)→% VMs_ page.
+# On the Connection %(rarr)→% Data screen set the Auto-login username to the username listed in the *logins* column on the Arvados Workbench page _Compute %(rarr)→% Virtual machines_.
# On the Connection %(rarr)→% Proxy screen set the Proxy Type to “Local”.
# On the Connection %(rarr)→% Proxy screen in the “Telnet command, or local proxy command” box enter:
<code>plink -P 2222 turnout@switchyard.qr1hi.arvadosapi.com %host</code>
Make sure there is no newline at the end of the text entry.
-# Return to the Session screen. In the Saved Sessions box, enter a name for this configuration and hit Save.
+# Return to the Session screen. In the Saved Sessions box, enter a name for this configuration and click Save.
+
+_Note: We recommend you do not delete the “Default” Saved Session._
h3. Connecting to the VM
-# Open PuTTY
+# Open PuTTY from the Start Menu.
# Click on the Saved Session name you created in the previous section.
# Click Load to load those saved session settings.
-# Click Open and that will open the SSH window at the command prompt. You will now be logged in to your virtual machine.
+# Click Open to open the SSH window at the command prompt. You will now be logged into your virtual machine.
You should now be able to log into the Arvados VM and "check your environment.":check-environment.html
This user guide introduces how to use the major components of Arvados. These are:
* Keep: Content-addressable cluster file system designed for robust storage of very large files, such as whole genome sequences running in the hundreds of gigabytes
-* Crunch: Cluster compute engine designed for genomic analysis, e.g. alignment, variant calls
-* Metadata Database: Information about the genomic data stored in Keep, such as genomic traits, human subjects
-* Workbench: Web interface to Arvados components
+* Crunch: Cluster compute engine designed for genomic analysis, such as alignment and variant calls
+* Metadata Database: Information about the genomic data stored in Keep, such as genomic traits and human subjects
+* Workbench: Arvados' Web interface
h2. Prerequisites
To get the most value out of this guide, you should be comfortable with the following:
-# Using a secure shell client such as @ssh@ or @putty@ to log on to a remote server
-# Using the unix command line shell @bash@
+# Using a secure shell client such as @ssh@ or @putty@ to log on to a remote server
+# Using the Unix command line shell @bash@
# Viewing and editing files using a unix text editor such as @vi@, @emacs@, or @nano@
# Programming in @python@
# Revision control using @git@
We also recommend you read the "Arvados Platform Overview":https://arvados.org/projects/arvados/wiki#Platform-Overview for an introduction and background information about Arvados.
-The examples in this guide uses the Arvados instance located at "https://{{ site.arvados_workbench_host }}/":https://{{ site.arvados_workbench_host }}/ . If you are using a different Arvados instance replace @{{ site.arvados_workbench_host }}@ with your private instance in all of the examples in this guide.
+The examples in this guide use the Arvados instance located at "https://{{ site.arvados_workbench_host }}/":https://{{ site.arvados_workbench_host }}/. If you are using a different Arvados instance replace @{{ site.arvados_workbench_host }}@ with your private instance in all of the examples in this guide.
-The Arvados public beta instance is located at "https://workbench.qr1hi.arvadosapi.com/":https://workbench.qr1hi.arvadosapi.com/ . You must have an account in order to use this service. If you would like to request an account, please send an email to "arvados@curoverse.com":mailto:arvados@curoverse.com .
+The Arvados public beta instance is located at "https://workbench.qr1hi.arvadosapi.com/":https://workbench.qr1hi.arvadosapi.com/. You must have an account in order to use this service. If you would like to request an account, please send an email to "arvados@curoverse.com":mailto:arvados@curoverse.com.
h2. Typographic conventions
<notextile>
<ul>
-<li>Code blocks which are set aside from the text indicate user input to the system. Commands that should be entered into a Unix shell are indicated by the directory where you should enter the command ('~' indicates your home directory) followed by '$', followed by the highlighted <span class="userinput">command to enter</span> (do not enter the '$'), and possibly followed by example command output in black. For example, the following block indicates that you should type "ls foo.*" while in your home directory and the expected output will be "foo.input" and "foo.output".
-<pre><code>~$ <span class="userinput">ls foo</span>
-foo
+<li>Code blocks which are set aside from the text indicate user input to the system. Commands that should be entered into a Unix shell are indicated by the directory where you should enter the command ('~' indicates your home directory) followed by '$', followed by the highlighted <span class="userinput">command to enter</span> (do not enter the '$'), and possibly followed by example command output in black. For example, the following block indicates that you should type <code>ls foo.*</code> while in your home directory and the expected output will be "foo.input" and "foo.output".
+<pre><code>~$ <span class="userinput">ls foo.*</span>
+foo.input foo.output
</code></pre>
</li>
<li>Code blocks inline with text emphasize specific <code>programs</code>, <code>files</code>, or <code>options</code> that are being discussed.</li>
-<li>Bold text emphasizes <b>specific items</b> to look when discussing Arvados Workbench pages.</li>
-<li>A sequence of steps separated by right arrows (<span class="rarr">→</span>) indicate a path the user should follow through the Arvados Workbench to access some piece of information under discussion. The steps indicate a menu, hyperlink, column name, field name, or other label on the page that guide the user where to look or click.
+<li>Bold text emphasizes <b>specific items</b> to review on Arvados Workbench pages.</li>
+<li>A sequence of steps separated by right arrows (<span class="rarr">→</span>) indicate a path the user should follow through the Arvados Workbench. The steps indicate a menu, hyperlink, column name, field name, or other label on the page that guide the user where to look or click.
</li>
</ul>
</notextile>
The Arvados API token is a secret key that enables the @arv@ command line client to access Arvados with the proper permissions.
-Access the Arvados workbench using this link: "https://{{ site.arvados_workbench_host }}/":https://{{ site.arvados_workbench_host }}/
+Access the Arvados Workbench using this link: "https://{{ site.arvados_workbench_host }}/":https://{{ site.arvados_workbench_host }}/ (Replace @{{ site.arvados_api_host }}@ with the hostname of your local Arvados instance if necessary.)
-(Replace @{{ site.arvados_api_host }}@ with the hostname of your local Arvados instance if necessary.)
+Open a shell on the system where you want to use the Arvados client. This may be your local workstation, or "an Arvados virtual machine accessed with ssh":{{site.baseurl}}/user/getting_started/ssh-access.html.
-First, open a shell on the system on which you intend to use the Arvados client (this may be your local workstation, or an Arvados VM, refer to "Accessing Arvados over ssh":{{site.baseurl}}/user/getting_started/ssh-access.html ) .
-
-Click on the user icon <span class="glyphicon glyphicon-user"></span> in the upper right corner to access the user settings menu, and click on the menu item _Manage API token_ to go to the "api client authorizations" page.
+Click on the user icon <span class="glyphicon glyphicon-user"></span> in the upper right corner to access the user settings menu. Click on the menu item *Manage API tokens* to go to the "Api client authorizations" page.
h2. The easy way
-For your convenience, the "api client authorizations" page on Workbench provides a "Help" tab that provides a command you may copy and paste directly into the shell. It will look something like this:
+For your convenience, the "Api client authorizations" page on Workbench provides a *Help* tab that includes a command you may copy and paste directly into the shell. It will look something like this:
bc. ### Pasting the following lines at a shell prompt will allow Arvados SDKs
-### to authenticate to your account, youraddress@example.com
+### to authenticate to your account, you@example.com
read ARVADOS_API_TOKEN <<EOF
2jv9346o396exampledonotuseexampledonotuseexes7j1ld
EOF
export ARVADOS_API_TOKEN ARVADOS_API_HOST={{ site.arvados_api_host }}
-* The @read@ command takes the contents of stdin and puts it into the shell variable named on the command line.
-* The @<<EOF@ notation means read each line on stdin and pipe it to the command, terminating on reading the line @EOF@.
-* The @export@ command puts a local shell variable into the environment that will be inherited by child processes (e.g. the @arv@ client).
+* The @read@ command reads text input until @EOF@ (designated by @<<EOF@) and stores it in the @ARVADOS_API_TOKEN@ environment variable.
+* The @export@ command puts a local shell variable into the environment that will be inherited by child processes such as the @arv@ client.
h2. Setting the environment manually
</code></pre>
</notextile>
-* @ARVADOS_API_HOST@ tells @arv@ which host to connect to
-* @ARVADOS_API_TOKEN@ is the secret key used by the Arvados API server to authenticate access.
+* @ARVADOS_API_HOST@ tells @arv@ which host to connect to.
+* @ARVADOS_API_TOKEN@ is the secret key used by the Arvados API server to authenticate access. Its value is the text you copied from the *api_token* column on the Workbench.
If you are connecting to a development instance with a unverified/self-signed SSL certificate, set this variable to skip SSL validation:
h2. settings.conf
-Arvados tools will also look for the authentication information in @~/.config/arvados/settings.conf@. If you have already put the variables into the environment with instructions above, you can use these commands to create an Arvados configuration file:
+Arvados tools will also look for the authentication information in @~/.config/arvados/settings.conf@. If you have already put the variables into the environment following the instructions above, you can use these commands to create an Arvados configuration file:
<notextile>
<pre><code>$ <span class="userinput">echo "ARVADOS_API_HOST=$ARVADOS_API_HOST" > ~/.config/arvados/settings.conf</span>
h2. .bashrc
-Alternately, you may add the declarations of @ARVADOS_API_HOST@ and @ARVADOS_API_TOKEN@ to the @~/.bashrc@ file on the system on which you intend to use the Arvados client. If you have already put the variables into the environment with instructions above, you can use these commands to append the environment variables to your @~/.bashrc@:
+Alternately, you may add the declarations of @ARVADOS_API_HOST@ and @ARVADOS_API_TOKEN@ to the @~/.bashrc@ file on the system on which you intend to use the Arvados client. If you have already put the variables into the environment following the instructions above, you can use these commands to append the environment variables to your @~/.bashrc@:
<notextile>
<pre><code>$ <span class="userinput">echo "export ARVADOS_API_HOST=$ARVADOS_API_HOST" >> ~/.bashrc</span>
# If 'nondeterministic' or 'no_reuse' are true, always create a new job.
# Find a list of acceptable values for 'script_version'. If 'minimum_script_version' is specified, this is the set of all revisions in the git commit graph between 'minimum_script_version' and 'script_version' (inclusive) [2]. If 'minimum_script_version' is not specified, only 'script_version' is added to the list. If 'exclude_script_versions' is specified, the listed versions are excluded from the list.
-# Select jobs have the same 'script' and 'script_parameters' attributes, and where the 'script_version' attribute is in the list of acceptable versions. Exclude failed jobs or where 'nondeterministic' is true.
+# Select jobs have the same 'script' and 'script_parameters' attributes, and where the 'script_version' attribute is in the list of acceptable versions. Exclude jobs that failed or set 'nondeterministic' to true.
# If there is more than one candidate job, check that all selected past jobs actually did produce the same output.
# If everything passed, re-use one of the selected past jobs (if there is more than one match, which job will be returned is undefined). Otherwise create a new job.
h3. Examples
-Run the script "crunch_scripts/hash.py" in the repository "you" using the "master" branch head. Arvados is allowed to re-use a previous job if the script_version of the past job is the same as the "master" branch head (i.e. there have not been any subsequent commits to "master").
+Run the script "crunch_scripts/hash.py" in the repository "you" using the "master" branch head. Arvados is allowed to re-use a previous job if the script_version of the past job is the same as the "master" branch head (i.e., there have not been any subsequent commits to "master").
-<pre>
+<notextile><pre>
{
"script": "hash.py",
- "repository": "you",
+ "repository": "<b>you</b>",
"script_version": "master",
"script_parameters": {
"input": "c1bad4b39ca5a924e481008009d94e32+210"
}
}
-</pre>
+</pre></notextile>
-Run using exactly the version "d00220fb38d4b85ca8fc28a8151702a2b9d1dec5". Arvados is allowed to re-use a previous job if the script_version of that job is also "d00220fb38d4b85ca8fc28a8151702a2b9d1dec5".
+Run using exactly the version "d00220fb38d4b85ca8fc28a8151702a2b9d1dec5". Arvados is allowed to re-use a previous job if the "script_version" of that job is also "d00220fb38d4b85ca8fc28a8151702a2b9d1dec5".
-<pre>
+<notextile><pre>
{
"script": "hash.py",
- "repository": "you",
+ "repository": "<b>you</b>",
"script_version": "d00220fb38d4b85ca8fc28a8151702a2b9d1dec5",
"script_parameters": {
"input": "c1bad4b39ca5a924e481008009d94e32+210"
}
}
-</pre>
+</pre></notextile>
-Arvados is allowed to re-use a previous job if the script_version of the past job is between "earlier_version_tag" and the head of the "master" branch (inclusive), but not "blacklisted_version_tag". If there are no previous jobs, run the job using the head of the "master" branch as specified in "script_version".
+Arvados is allowed to re-use a previous job if the "script_version" of the past job is between "earlier_version_tag" and the head of the "master" branch (inclusive), but not "blacklisted_version_tag". If there are no previous jobs, run the job using the head of the "master" branch as specified in "script_version".
-<pre>
+<notextile><pre>
{
"script": "hash.py",
- "repository": "you",
+ "repository": "<b>you</b>",
"minimum_script_version": "earlier_version_tag",
"script_version": "master",
- "exclude_script_versions", ["blacklisted_version_tag"],
+ "exclude_script_versions": ["blacklisted_version_tag"],
"script_parameters": {
"input": "c1bad4b39ca5a924e481008009d94e32+210"
}
}
-</pre>
+</pre></notextile>
Run the script "crunch_scripts/monte-carlo.py" in the repository "you" using the "master" branch head. Because it is marked as "nondeterministic", never re-use previous jobs, and never re-use this job.
-<pre>
+<notextile><pre>
{
"script": "monte-carlo.py",
- "repository": "you",
+ "repository": "<b>you</b>",
"script_version": "master",
"nondeterministic": true,
"script_parameters": {
"input": "c1bad4b39ca5a924e481008009d94e32+210"
}
}
-</pre>
+</pre></notextile>
h2. Pipelines
h3. Examples
-This a pipeline named "Filter md5 hash values" with two components, "do_hash" and "filter". The "input" script parameter of the "do_hash" component is required to be filled in by the user, and the expected data type is "Collection". This also specifies that the "input" script parameter of the "filter" component is the output of "do_hash", so "filter" will not run until "do_hash" completes successfully. When the pipeline runs, past jobs that meet the criteria described above may be substituted for either or both components to avoid redundant computation.
+This is a pipeline named "Filter md5 hash values" with two components, "do_hash" and "filter". The "input" script parameter of the "do_hash" component is required to be filled in by the user, and the expected data type is "Collection". This also specifies that the "input" script parameter of the "filter" component is the output of "do_hash", so "filter" will not run until "do_hash" completes successfully. When the pipeline runs, past jobs that meet the criteria described above may be substituted for either or both components to avoid redundant computation.
-<pre>
+<notextile><pre>
{
"name": "Filter md5 hash values",
"components": {
"do_hash": {
"script": "hash.py",
- "repository": "you",
+ "repository": "<b>you</b>",
"script_version": "master",
"script_parameters": {
"input": {
},
"filter": {
"script": "0-filter.py",
- "repository": "you",
+ "repository": "<b>you</b>",
"script_version": "master",
"script_parameters": {
"input": {
}
}
}
-</pre>
+</pre></notextile>
This pipeline consists of three components. The components "thing1" and "thing2" both depend on "cat_in_the_hat". Once the "cat_in_the_hat" job is complete, both "thing1" and "thing2" can run in parallel, because they do not depend on each other.
-<pre>
+<notextile><pre>
{
"name": "Wreck the house",
"components": {
"cat_in_the_hat": {
"script": "cat.py",
- "repository": "you",
+ "repository": "<b>you</b>",
"script_version": "master",
"script_parameters": { }
},
"thing1": {
"script": "thing1.py",
- "repository": "you",
+ "repository": "<b>you</b>",
"script_version": "master",
"script_parameters": {
"input": {
},
"thing2": {
"script": "thing2.py",
- "repository": "you",
+ "repository": "<b>you</b>",
"script_version": "master",
"script_parameters": {
"input": {
},
}
}
-</pre>
+</pre></notextile>
This pipeline consists of three components. The component "cleanup" depends on "thing1" and "thing2". Both "thing1" and "thing2" are started immediately and can run in parallel, because they do not depend on each other, but "cleanup" cannot begin until both "thing1" and "thing2" have completed.
-<pre>
+<notextile><pre>
{
"name": "Clean the house",
"components": {
"thing1": {
"script": "thing1.py",
- "repository": "you",
+ "repository": "<b>you</b>",
"script_version": "master",
"script_parameters": { }
},
"thing2": {
"script": "thing2.py",
- "repository": "you",
+ "repository": "<b>you</b>",
"script_version": "master",
"script_parameters": { }
},
"cleanup": {
"script": "cleanup.py",
- "repository": "you",
+ "repository": "<b>you</b>",
"script_version": "master",
"script_parameters": {
"mess1": {
}
}
}
-</pre>
+</pre></notextile>
</code></pre>
</notextile>
-The command @arv keep get@ fetches the contents of the locator @c1bad4b39ca5a924e481008009d94e32+210@. This is a locator for a collection data block, so it fetches the contents of the collection. In this example, this collection consists of a single file @var-GS000016015-ASM.tsv.bz2@ which is 227212247 bytes long, and is stored using four sequential data blocks, <code>204e43b8a1185621ca55a94839582e6f+67108864</code>, <code>b9677abbac956bd3e86b1deb28dfac03+67108864</code>, <code>fc15aff2a762b13f521baf042140acec+67108864</code>, <code>323d2a3ce20370c4ca1d3462a344f8fd+25885655</code>.
+The command @arv keep get@ fetches the contents of the collection @c1bad4b39ca5a924e481008009d94e32+210@. In this example, this collection includes a single file @var-GS000016015-ASM.tsv.bz2@ which is 227212247 bytes long, and is stored using four sequential data blocks, @204e43b8a1185621ca55a94839582e6f+67108864@, @b9677abbac956bd3e86b1deb28dfac03+67108864@, @fc15aff2a762b13f521baf042140acec+67108864@, and @323d2a3ce20370c4ca1d3462a344f8fd+25885655@.
-Let's use @arv keep get@ to download the first datablock:
+Let's use @arv keep get@ to download the first data block:
notextile. <pre><code>~$ <span class="userinput">cd /scratch/<b>you</b></span>
/scratch/<b>you</b>$ <span class="userinput">arv keep get 204e43b8a1185621ca55a94839582e6f+67108864 > block1</span></code></pre>
</notextile>
Notice that the block identifer <code>204e43b8a1185621ca55a94839582e6f+67108864</code> consists of:
-* the md5 hash @204e43b8a1185621ca55a94839582e6f@ which matches the md5 hash of @block1@
-* a size hint @67108864@ which matches the size of @block1@
+* the md5 hash of @block1@, @204e43b8a1185621ca55a94839582e6f@, plus
+* the size of @block1@, @67108864@.
"script_parameters":{
"input": "887cd41e9c613463eab2f0d885c6dd96+83"
},
- "repository":"<b>you</b>",
+ "repository":"$USER",
"script_version":"master"
},
"filter":{
"output_of":"do_hash"
}
},
- "repository":"<b>you</b>",
+ "repository":"$USER",
"script_version":"master"
}
}
~$ <span class="userinput">arv pipeline_template create --pipeline-template "$(cat the_pipeline)"</span></code></pre>
</notextile>
+(Your shell should automatically fill in @$USER@ with your login name. The JSON that gets saved should have @"repository"@ pointed at your personal git repository.)
+
You can run this pipeline from the command line using @arv pipeline run@, filling in the UUID that you received from @arv pipeline_template create@:
<notextile>
-<pre><code>~$ <span class="userinput">arv pipeline run --template qr1hi-p5p6p-xxxxxxxxxxxxxxx</span>
+<pre><code>~$ <span class="userinput">arv pipeline run --run-here --template qr1hi-p5p6p-xxxxxxxxxxxxxxx</span>
2013-12-16 14:08:40 +0000 -- pipeline_instance qr1hi-d1hrv-vxzkp38nlde9yyr
do_hash qr1hi-8i9sb-hoyc2u964ecv1s6 queued 2013-12-16T14:08:40Z
filter - -
2013-12-16 14:08:51 +0000 -- pipeline_instance qr1hi-d1hrv-vxzkp38nlde9yyr
-do_hash qr1hi-8i9sb-hoyc2u964ecv1s6 8e1b6acdd3f2f1da722538127c5c6202+56
+do_hash qr1hi-8i9sb-hoyc2u964ecv1s6 1ed9ed18ef31ad21bcabcfeff7777bae+162
filter qr1hi-8i9sb-w5k40fztqgg9i2x queued 2013-12-16T14:08:50Z
2013-12-16 14:09:01 +0000 -- pipeline_instance qr1hi-d1hrv-vxzkp38nlde9yyr
-do_hash qr1hi-8i9sb-hoyc2u964ecv1s6 8e1b6acdd3f2f1da722538127c5c6202+56
-filter qr1hi-8i9sb-w5k40fztqgg9i2x 735ac35adf430126cf836547731f3af6+56
+do_hash qr1hi-8i9sb-hoyc2u964ecv1s6 1ed9ed18ef31ad21bcabcfeff7777bae+162
+filter qr1hi-8i9sb-w5k40fztqgg9i2x d3bcc2ee0f0ea31049000c721c0f3a2a+56
</code></pre>
</notextile>
-This instantiates your pipeline and displays a live feed of its status. The new pipeline instance will also show up on the Workbench %(rarr)→% Compute %(rarr)→% Pipeline instances page.
+This instantiates your pipeline and displays a live feed of its status. The new pipeline instance will also show up on Workbench *Activity* %(rarr)→% *Recent pipeline instances* page.
Arvados adds each pipeline component to the job queue as its dependencies are satisfied (or immediately if it has no dependencies) and finishes when all components are completed or failed and there is no more work left to do.
-The Keep locators of the output of each of @"do_hash"@ and @"filter"@ component are available from the output log shown above. The output is also available on the Workbench by navigating to %(rarr)→% Compute %(rarr)→% Pipeline instances %(rarr)→% pipeline uuid under the *id* column %(rarr)→% components.
+The Keep locators of the output of each of @"do_hash"@ and @"filter"@ component are available from the output log shown above. The output is also available on the Workbench by navigating to *Activity* %(rarr)→% *Recent pipeline instances* %(rarr)→% pipeline UUID under the *Instance* column %(rarr)→% *output* column.
<notextile>
-<pre><code>~$ <span class="userinput">arv keep get 8e1b6acdd3f2f1da722538127c5c6202+56/md5sum.txt</span>
-0f1d6bcf55c34bed7f92a805d2d89bbf alice.txt
-504938460ef369cd275e4ef58994cffe bob.txt
-8f3b36aff310e06f3c5b9e95678ff77a carol.txt
-~$ <span class="userinput">arv keep get 735ac35adf430126cf836547731f3af6+56/0-filter.txt</span>
-0f1d6bcf55c34bed7f92a805d2d89bbf alice.txt
+<pre><code>~$ <span class="userinput">arv keep get 1ed9ed18ef31ad21bcabcfeff7777bae+162/md5sum.txt</span>
+0f1d6bcf55c34bed7f92a805d2d89bbf 887cd41e9c613463eab2f0d885c6dd96+83/./alice.txt
+504938460ef369cd275e4ef58994cffe 887cd41e9c613463eab2f0d885c6dd96+83/./bob.txt
+8f3b36aff310e06f3c5b9e95678ff77a 887cd41e9c613463eab2f0d885c6dd96+83/./carol.txt
+~$ <span class="userinput">arv keep get d3bcc2ee0f0ea31049000c721c0f3a2a+56/0-filter.txt</span>
+0f1d6bcf55c34bed7f92a805d2d89bbf 887cd41e9c613463eab2f0d885c6dd96+83/./alice.txt
</code></pre>
</notextile>
You can specify values for pipeline component script_parameters like this:
<notextile>
-<pre><code>~$ <span class="userinput">arv pipeline run --template qr1hi-p5p6p-xxxxxxxxxxxxxxx do_hash::input=c1bad4b39ca5a924e481008009d94e32+210</span>
+<pre><code>~$ <span class="userinput">arv pipeline run --run-here --template qr1hi-p5p6p-xxxxxxxxxxxxxxx do_hash::input=c1bad4b39ca5a924e481008009d94e32+210</span>
2013-12-17 20:31:24 +0000 -- pipeline_instance qr1hi-d1hrv-tlkq20687akys8e
do_hash qr1hi-8i9sb-rffhuay4jryl2n2 queued 2013-12-17T20:31:24Z
filter - -
filter - -
2013-12-17 20:31:55 +0000 -- pipeline_instance qr1hi-d1hrv-tlkq20687akys8e
-do_hash qr1hi-8i9sb-rffhuay4jryl2n2 880b55fb4470b148a447ff38cacdd952+54
+do_hash qr1hi-8i9sb-rffhuay4jryl2n2 50cafdb29cc21dd6eaec85ba9e0c6134+56
filter qr1hi-8i9sb-j347g1sqovdh0op queued 2013-12-17T20:31:55Z
2013-12-17 20:32:05 +0000 -- pipeline_instance qr1hi-d1hrv-tlkq20687akys8e
-do_hash qr1hi-8i9sb-rffhuay4jryl2n2 880b55fb4470b148a447ff38cacdd952+54
+do_hash qr1hi-8i9sb-rffhuay4jryl2n2 50cafdb29cc21dd6eaec85ba9e0c6134+56
filter qr1hi-8i9sb-j347g1sqovdh0op 490cd451c8108824b8a17e3723e1f236+19
</code></pre>
</notextile>
Now check the output:
<notextile>
-<pre><code>~$ <span class="userinput">arv keep get 880b55fb4470b148a447ff38cacdd952+54/md5sum.txt</span>
-44b8ae3fde7a8a88d2f7ebd237625b4f var-GS000016015-ASM.tsv.bz2
+<pre><code>~$ <span class="userinput">arv keep get 50cafdb29cc21dd6eaec85ba9e0c6134+56/md5sum.txt</span>
+44b8ae3fde7a8a88d2f7ebd237625b4f c1bad4b39ca5a924e481008009d94e32+210/./var-GS000016015-ASM.tsv.bz2
~$ <span class="userinput">arv keep get 490cd451c8108824b8a17e3723e1f236+19/0-filter.txt</span>
</code></pre>
</notextile>
-Since none of the files in the collection have hash code that start with 0, output of the filter component is empty.
+Since none of the files in the collection have hash code that start with 0, the output of the filter component is empty.
h2. Create a new script
-Change to your git directory and create a new script in "crunch_scripts/".
+Change to your git directory and create a new script in @crunch_scripts/@.
<notextile>
<pre><code>~$ <span class="userinput">cd <b>you</b>/crunch_scripts</span>
h2. Using arv-crunch-job to run the job in your VM
-Instead of a git commit hash, we provide the path to the directory in the "script_version" parameter. The script specified in "script" will actually be searched for in the "crunch_scripts/" subdirectory of the directory specified "script_version". Although we are running the script locally, the script still requires access to the Arvados API server and Keep storage service. The job will be recorded in the Arvados job history, and visible in Workbench.
+Instead of a git commit hash, we provide the path to the directory in the "script_version" parameter. The script specified in "script" will actually be searched for in the @crunch_scripts/@ subdirectory of the directory specified "script_version". Although we are running the script locally, the script still requires access to the Arvados API server and Keep storage service. The job will be recorded in the Arvados job history, and visible in Workbench.
<notextile>
<pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">cat >~/the_job <<EOF
{
+ "repository":"",
"script":"hello-world.py",
- "script_version":"/home/<b>you</b>/<b>you</b>",
+ "script_version":"$HOME/$USER",
"script_parameters":{}
}
EOF</span>
-~/<b>you</b>/crunch_scripts</span>$ <span class="userinput">arv-crunch-job --job "$(cat ~/the_job)"</span>
+</code></pre>
+</notextile>
+
+Your shell should fill in values for @$HOME@ and @$USER@ so that the saved JSON points "script_version" at the directory with your checkout. Now you can run that job:
+
+<notextile>
+<pre><code>~/<b>you</b>/crunch_scripts</span>$ <span class="userinput">arv-crunch-job --job "$(cat ~/the_job)"</span>
2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 check slurm allocation
2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 node localhost - 1 slots
2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 start
2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 stderr hello world
2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 child 29834 on localhost.1 exit 0 signal 0 success=
2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 failure (#1, permanent) after 0 seconds
-2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 output
+2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 output
2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827 Every node has failed -- giving up on this round
2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827 wait for last 0 children to finish
2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827 status: 0 done, 0 running, 0 todo
~/<b>you</b>/crunch_scripts$ <span class="userinput">chmod +x hello-world-fixed.py</span>
~/<b>you</b>/crunch_scripts$ <span class="userinput">cat >~/the_job <<EOF
{
+ "repository":"",
"script":"hello-world-fixed.py",
- "script_version":"/home/<b>you</b>/<b>you</b>",
+ "script_version":"$HOME/$USER",
"script_parameters":{}
}
EOF</span>
2013-12-12_21:56:59 qr1hi-8i9sb-79260ykfew5trzl 31578 check slurm allocation
2013-12-12_21:56:59 qr1hi-8i9sb-79260ykfew5trzl 31578 node localhost - 1 slots
2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578 start
-2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578 script hello-world.py
+2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578 script hello-world-fixed.py
2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578 script_version /home/<b>you</b>/<b>you</b>
2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578 script_parameters {}
2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578 runtime_constraints {"max_tasks_per_node":0}
2013-12-12_21:57:02 qr1hi-8i9sb-79260ykfew5trzl 31578 Freeze not implemented
2013-12-12_21:57:02 qr1hi-8i9sb-79260ykfew5trzl 31578 collate
2013-12-12_21:57:02 qr1hi-8i9sb-79260ykfew5trzl 31578 output 576c44d762ba241b0a674aa43152b52a+53
+WARNING:root:API lookup failed for collection 576c44d762ba241b0a674aa43152b52a+53 (<class 'apiclient.errors.HttpError'>: <HttpError 404 when requesting https://qr1hi.arvadosapi.com/arvados/v1/collections/576c44d762ba241b0a674aa43152b52a%2B53?alt=json returned "Not Found">)
2013-12-12_21:57:03 qr1hi-8i9sb-79260ykfew5trzl 31578 finish
-2013-12-12_21:57:04 qr1hi-8i9sb-79260ykfew5trzl 31578 meta key is 9f937693334d0c9234ccc1f808ee7117+1761
</code></pre>
</notextile>
+(The WARNING issued near the end of the script may be safely ignored here; it is the Arvados SDK letting you know that it could not find a collection named @576c44d762ba241b0a674aa43152b52a+53@ and that it is going to try looking up a block by that name instead.)
+
The job succeeded, with output in Keep object @576c44d762ba241b0a674aa43152b52a+53@. Let's look at our output:
<notextile>
*This tutorial assumes that you are "logged into an Arvados VM instance":{{site.baseurl}}/user/getting_started/ssh-access.html#login, and have a "working environment.":{{site.baseurl}}/user/getting_started/check-environment.html*
-You will create a job to run the "hash" crunch script. The "hash" script computes the md5 hash of each file in a collection.
+You will create a job to run the "hash" Crunch script. The "hash" script computes the md5 hash of each file in a collection.
h2. Jobs
-Crunch pipelines consist of one or more jobs. A "job" is a single run of a specific version of a crunch script with a specific input. You an also run jobs individually.
+Crunch pipelines consist of one or more jobs. A "job" is a single run of a specific version of a Crunch script with a specific input. You can also run jobs individually.
-A request to run a crunch job are is described using a JSON object. For example:
+A request to run a Crunch job are is described using a JSON object. For example:
<notextile>
-<pre><code>~$ <span class="userinput">cat >the_job <<EOF
+<pre><code>~$ <span class="userinput">cat >~/the_job <<EOF
{
"script": "hash",
"repository": "arvados",
"script_version": "master",
"script_parameters": {
"input": "c1bad4b39ca5a924e481008009d94e32+210"
- }
+ },
+ "no_reuse": "true"
}
EOF
</code></pre>
</notextile>
-* @cat@ is a standard Unix utility that simply copies standard input to standard output
-* @<<EOF@ tells the shell to direct the following lines into the standard input for @cat@ up until it sees the line @EOF@
-* @>the_job@ redirects standard output to a file called @the_job@
-* @"script"@ specifies the name of the script to run. The script is searched for in the "crunch_scripts/" subdirectory of the @git@ checkout specified by @"script_version"@.
-* @"repository"@ is the git repository to search for the script version. You can access a list of available @git@ repositories on the Arvados workbench under "Compute %(rarr)→% Code repositories":https://{{site.arvados_workbench_host}}//repositories .
-* @"script_version"@ specifies the version of the script that you wish to run. This can be in the form of an explicit @git@ revision hash, a tag, or a branch (in which case it will take the HEAD of the specified branch). Arvados logs the script version that was used in the run, enabling you to go back and re-run any past job with the guarantee that the exact same code will be used as was used in the previous run.
-* @"script_parameters"@ are provided to the script. In this case, the input is the locator for the collection that we inspected in the previous section.
+* @cat@ is a standard Unix utility that writes a sequence of input to standard output.
+* @<<EOF@ tells the shell to direct the following lines into the standard input for @cat@ up until it sees the line @EOF@.
+* @>~/the_job@ redirects standard output to a file called @~/the_job@.
+* @"repository"@ is the name of a git repository to search for the script version. You can access a list of available git repositories on the Arvados Workbench under "*Compute* %(rarr)→% *Code repositories*":https://{{site.arvados_workbench_host}}/repositories.
+* @"script_version"@ specifies the version of the script that you wish to run. This can be in the form of an explicit git revision hash, a tag, or a branch (in which case it will use the most recent commit on the specified branch). Arvados logs the script version that was used in the run, enabling you to go back and re-run any past job with the guarantee that the exact same code will be used as was used in the previous run.
+* @"script"@ specifies the name of the script to run. The script is searched for in the @crunch_scripts/@ subdirectory of the git repository.
+* @"script_parameters"@ are provided to the script. In this case, the input is the PGP data Collection that we "put in Keep earlier":{{site.baseurl}}/user/tutorials/tutorial-keep.html.
+* Setting the @"no_reuse"@ flag tells Crunch not to reuse work from past jobs. This helps ensure that you can watch a new Job process for the rest of this tutorial, without reusing output from a past run that you made, or somebody else marked as public. (If you want to experiment, after the first run below finishes, feel free to edit this job to remove the @"no_reuse"@ line and resubmit it. See what happens!)
Use @arv job create@ to actually submit the job. It should print out a JSON object which describes the newly created job:
<notextile>
-<pre><code>~$ <span class="userinput">arv job create --job "$(cat the_job)"</span>
+<pre><code>~$ <span class="userinput">arv job create --job "$(cat ~/the_job)"</span>
{
"href":"https://qr1hi.arvadosapi.com/arvados/v1/jobs/qr1hi-8i9sb-1pm1t02dezhupss",
"kind":"arvados#job",
The job is now queued and will start running as soon as it reaches the front of the queue. Fields to pay attention to include:
- * @"uuid"@ is the unique identifier for this specific job
+ * @"uuid"@ is the unique identifier for this specific job.
* @"script_version"@ is the actual revision of the script used. This is useful if the version was described using the "repository:branch" format.
h2. Monitor job progress
-Go to the "Workbench dashboard":https://{{site.arvados_workbench_host}}. Your job should be at the top of the "Recent jobs" table. This table refreshes automatically. When the job has completed successfully, it will show <span class="label label-success">finished</span> in the *Status* column.
+Go to the "Workbench dashboard":https://{{site.arvados_workbench_host}} and visit *Activity* %(rarr)→% *Recent jobs*. Your job should be near the top of the table. This table refreshes automatically. When the job has completed successfully, it will show <span class="label label-success">finished</span> in the *Status* column.
On the command line, you can access log messages while the job runs using @arv job log_tail_follow@:
h2. Inspect the job output
-On the "Workbench dashboard":https://{{site.arvados_workbench_host}}, look for the *Output* column of the *Recent jobs* table. Click on the link under *Output* for your job to go to the files page with the job output. The files page lists all the files that were output by the job. Click on the link under the *files* column to view a file, or click on the download icon <span class="glyphicon glyphicon-download-alt"></span> to download the output file.
+On the "Workbench dashboard":https://{{site.arvados_workbench_host}}, look for the *Output* column of the *Recent jobs* table. Click on the link under *Output* for your job to go to the files page with the job output. The files page lists all the files that were output by the job. Click on the link under the *file* column to view a file, or click on the download icon <span class="glyphicon glyphicon-download-alt"></span> to download the output file.
On the command line, you can use @arv job get@ to access a JSON object describing the output:
<notextile>
<pre><code>~$ <span class="userinput">arv keep ls dd755dbc8d49a67f4fe7dc843e4f10a6+54</span>
-md5sum.txt
+./md5sum.txt
</code></pre>
</notextile>
</code></pre>
</notextile>
-This md5 hash matches the md5 hash which we computed earlier.
+This md5 hash matches the md5 hash which we "computed earlier":{{site.baseurl}}/user/tutorials/tutorial-keep.html.
h2. The job log
-When the job completes, you can access the job log. On the workbench dashboard, this is the link under the *Log* column of the *Recent jobs* table.
+When the job completes, you can access the job log. On the Workbench, visit *Activity* %(rarr)→% *Recent jobs* %(rarr)→% your job's UUID under the *uuid* column %(rarr)→% the collection link on the *log* row.
-On the command line, the keep identifier listed in the @"log"@ field from @arv job get@ specifies a collection. You can list the files in the collection:
+On the command line, the Keep identifier listed in the @"log"@ field from @arv job get@ specifies a collection. You can list the files in the collection:
<notextile>
<pre><code>~$ <span class="userinput">arv keep ls xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx+91</span>
-qr1hi-8i9sb-xxxxxxxxxxxxxxx.log.txt
+./qr1hi-8i9sb-xxxxxxxxxxxxxxx.log.txt
</code></pre>
</notextile>
-The log collection consists of one log file named with the job id. You can access it using @arv keep get@:
+The log collection consists of one log file named with the job's UUID. You can access it using @arv keep get@:
<notextile>
<pre><code>~$ <span class="userinput">arv keep get xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx+91/qr1hi-8i9sb-xxxxxxxxxxxxxxx.log.txt</span>
-2013-12-16_20:44:35 qr1hi-8i9sb-1pm1t02dezhupss 7575 check slurm allocation
-2013-12-16_20:44:35 qr1hi-8i9sb-1pm1t02dezhupss 7575 node compute13 - 8 slots
-2013-12-16_20:44:36 qr1hi-8i9sb-1pm1t02dezhupss 7575 start
-2013-12-16_20:44:36 qr1hi-8i9sb-1pm1t02dezhupss 7575 Install revision d9cd657b733d578ac0d2167dd75967aa4f22e0ac
-2013-12-16_20:44:37 qr1hi-8i9sb-1pm1t02dezhupss 7575 Clean-work-dir exited 0
-2013-12-16_20:44:37 qr1hi-8i9sb-1pm1t02dezhupss 7575 Install exited 0
-2013-12-16_20:44:37 qr1hi-8i9sb-1pm1t02dezhupss 7575 script hash
-2013-12-16_20:44:37 qr1hi-8i9sb-1pm1t02dezhupss 7575 script_version d9cd657b733d578ac0d2167dd75967aa4f22e0ac
-2013-12-16_20:44:37 qr1hi-8i9sb-1pm1t02dezhupss 7575 script_parameters {"input":"c1bad4b39ca5a924e481008009d94e32+210"}
-2013-12-16_20:44:37 qr1hi-8i9sb-1pm1t02dezhupss 7575 runtime_constraints {"max_tasks_per_node":0}
-2013-12-16_20:44:37 qr1hi-8i9sb-1pm1t02dezhupss 7575 start level 0
-2013-12-16_20:44:37 qr1hi-8i9sb-1pm1t02dezhupss 7575 status: 0 done, 0 running, 1 todo
-2013-12-16_20:44:38 qr1hi-8i9sb-1pm1t02dezhupss 7575 0 job_task qr1hi-ot0gb-23c1k3kwrf8da62
-2013-12-16_20:44:38 qr1hi-8i9sb-1pm1t02dezhupss 7575 0 child 7681 started on compute13.1
-
-2013-12-16_20:44:38 qr1hi-8i9sb-1pm1t02dezhupss 7575 status: 0 done, 1 running, 0 todo
-2013-12-16_20:44:39 qr1hi-8i9sb-1pm1t02dezhupss 7575 0 child 7681 on compute13.1 exit 0 signal 0 success=true
-2013-12-16_20:44:39 qr1hi-8i9sb-1pm1t02dezhupss 7575 0 success in 1 seconds
-2013-12-16_20:44:39 qr1hi-8i9sb-1pm1t02dezhupss 7575 0 output
-2013-12-16_20:44:39 qr1hi-8i9sb-1pm1t02dezhupss 7575 wait for last 0 children to finish
-2013-12-16_20:44:39 qr1hi-8i9sb-1pm1t02dezhupss 7575 status: 1 done, 0 running, 1 todo
-2013-12-16_20:44:39 qr1hi-8i9sb-1pm1t02dezhupss 7575 start level 1
-2013-12-16_20:44:39 qr1hi-8i9sb-1pm1t02dezhupss 7575 status: 1 done, 0 running, 1 todo
-2013-12-16_20:44:39 qr1hi-8i9sb-1pm1t02dezhupss 7575 1 job_task qr1hi-ot0gb-iwr0o3unqothg28
-2013-12-16_20:44:39 qr1hi-8i9sb-1pm1t02dezhupss 7575 1 child 7716 started on compute13.1
-2013-12-16_20:44:39 qr1hi-8i9sb-1pm1t02dezhupss 7575 status: 1 done, 1 running, 0 todo
-2013-12-16_20:44:52 qr1hi-8i9sb-1pm1t02dezhupss 7575 1 child 7716 on compute13.1 exit 0 signal 0 success=true
-2013-12-16_20:44:52 qr1hi-8i9sb-1pm1t02dezhupss 7575 1 success in 13 seconds
-2013-12-16_20:44:52 qr1hi-8i9sb-1pm1t02dezhupss 7575 1 output dd755dbc8d49a67f4fe7dc843e4f10a6+54
-2013-12-16_20:44:52 qr1hi-8i9sb-1pm1t02dezhupss 7575 wait for last 0 children to finish
-2013-12-16_20:44:52 qr1hi-8i9sb-1pm1t02dezhupss 7575 status: 2 done, 0 running, 0 todo
-2013-12-16_20:44:52 qr1hi-8i9sb-1pm1t02dezhupss 7575 release job allocation
-2013-12-16_20:44:52 qr1hi-8i9sb-1pm1t02dezhupss 7575 Freeze not implemented
-2013-12-16_20:44:52 qr1hi-8i9sb-1pm1t02dezhupss 7575 collate
-2013-12-16_20:44:53 qr1hi-8i9sb-1pm1t02dezhupss 7575 output dd755dbc8d49a67f4fe7dc843e4f10a6+54+K@qr1hi
-2013-12-16_20:44:53 qr1hi-8i9sb-1pm1t02dezhupss 7575 finish
+2013-12-16_20:44:35 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 check slurm allocation
+2013-12-16_20:44:35 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 node compute13 - 8 slots
+2013-12-16_20:44:36 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 start
+2013-12-16_20:44:36 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 Install revision d9cd657b733d578ac0d2167dd75967aa4f22e0ac
+2013-12-16_20:44:37 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 Clean-work-dir exited 0
+2013-12-16_20:44:37 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 Install exited 0
+2013-12-16_20:44:37 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 script hash
+2013-12-16_20:44:37 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 script_version d9cd657b733d578ac0d2167dd75967aa4f22e0ac
+2013-12-16_20:44:37 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 script_parameters {"input":"c1bad4b39ca5a924e481008009d94e32+210"}
+2013-12-16_20:44:37 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 runtime_constraints {"max_tasks_per_node":0}
+2013-12-16_20:44:37 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 start level 0
+2013-12-16_20:44:37 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 status: 0 done, 0 running, 1 todo
+2013-12-16_20:44:38 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 0 job_task qr1hi-ot0gb-23c1k3kwrf8da62
+2013-12-16_20:44:38 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 0 child 7681 started on compute13.1
+2013-12-16_20:44:38 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 status: 0 done, 1 running, 0 todo
+2013-12-16_20:44:39 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 0 child 7681 on compute13.1 exit 0 signal 0 success=true
+2013-12-16_20:44:39 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 0 success in 1 seconds
+2013-12-16_20:44:39 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 0 output
+2013-12-16_20:44:39 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 wait for last 0 children to finish
+2013-12-16_20:44:39 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 status: 1 done, 0 running, 1 todo
+2013-12-16_20:44:39 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 start level 1
+2013-12-16_20:44:39 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 status: 1 done, 0 running, 1 todo
+2013-12-16_20:44:39 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 1 job_task qr1hi-ot0gb-iwr0o3unqothg28
+2013-12-16_20:44:39 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 1 child 7716 started on compute13.1
+2013-12-16_20:44:39 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 status: 1 done, 1 running, 0 todo
+2013-12-16_20:44:52 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 1 child 7716 on compute13.1 exit 0 signal 0 success=true
+2013-12-16_20:44:52 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 1 success in 13 seconds
+2013-12-16_20:44:52 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 1 output dd755dbc8d49a67f4fe7dc843e4f10a6+54
+2013-12-16_20:44:52 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 wait for last 0 children to finish
+2013-12-16_20:44:52 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 status: 2 done, 0 running, 0 todo
+2013-12-16_20:44:52 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 release job allocation
+2013-12-16_20:44:52 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 Freeze not implemented
+2013-12-16_20:44:52 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 collate
+2013-12-16_20:44:53 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 output dd755dbc8d49a67f4fe7dc843e4f10a6+54+K@qr1hi
+2013-12-16_20:44:53 qr1hi-8i9sb-xxxxxxxxxxxxxxx 7575 finish
</code></pre>
</notextile>
notextile. <pre>~/<b>you</b>/crunch_scripts$ <code class="userinput">nano parallel-hash.py</code></pre>
-Add the following code to compute the md5 hash of each file in a
+Add the following code to compute the md5 hash of each file in a collection:
<notextile> {% code 'parallel_hash_script_py' as python %} </notextile>
<pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">cat >~/the_job <<EOF
{
"script": "parallel-hash.py",
- "repository": "<b>you</b>",
+ "repository": "$USER",
"script_version": "master",
"script_parameters":
{
</code></pre>
</notextile>
+(Your shell should automatically fill in @$USER@ with your login name. The job JSON that gets saved should have @"repository"@ pointed at your personal git repository.)
+
Because the job ran in parallel, each instance of parallel-hash creates a separate @md5sum.txt@ as output. Arvados automatically collates theses files into a single collection, which is the output of the job:
<notextile>
<pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">arv keep ls e2ccd204bca37c77c0ba59fc470cd0f7+162</span>
-md5sum.txt
-md5sum.txt
-md5sum.txt
+./md5sum.txt
~/<b>you</b>/crunch_scripts$ <span class="userinput">arv keep get e2ccd204bca37c77c0ba59fc470cd0f7+162/md5sum.txt</span>
0f1d6bcf55c34bed7f92a805d2d89bbf alice.txt
504938460ef369cd275e4ef58994cffe bob.txt
</code></pre>
</notextile>
-Next, using @nano@ or your favorite Unix text editor, create a new file called @run-md5sum.py@ in the @crunch_scripts@ directory.
+Next, using @nano@ or your favorite Unix text editor, create a new file called @run-md5sum.py@ in the @crunch_scripts@ directory.
notextile. <pre>~/<b>you</b>/crunch_scripts$ <code class="userinput">nano run-md5sum.py</code></pre>
notextile. <pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">chmod +x run-md5sum.py</span></code></pre>
-Next, add the file to @git@ staging, commit and push:
+Next, use @git@ to stage the file, commit, and push:
<notextile>
<pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">git add run-md5sum.py</span>
</code></pre>
</notextile>
-You should now be able to run your new script using Crunch, with "script" referring to our new "run-md5sum.py" script.
+You should now be able to run your new script using Crunch, with @"script"@ referring to our new @run-md5sum.py@ script.
<notextile>
<pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">cat >~/the_pipeline <<EOF
"dataclass": "Collection"
}
},
- "repository":"<b>you</b>",
+ "repository":"$USER",
"script_version":"master"
}
}
</code></pre>
</notextile>
-Your new pipeline template will appear on the "Workbench %(rarr)→% Compute %(rarr)→% Pipeline templates":https://{{ site.arvados_workbench_host }}/pipeline_instances page. You can run the "pipeline using workbench":tutorial-pipeline-workbench.html
+(Your shell should automatically fill in @$USER@ with your login name. The JSON that gets saved should have @"repository"@ pointed at your personal git repository.)
+
+Your new pipeline template will appear on the Workbench "Compute %(rarr)→% Pipeline templates":https://{{ site.arvados_workbench_host }}/pipeline_instances page. You can run the "pipeline using Workbench":tutorial-pipeline-workbench.html.
h2. Setting up Git
-As discussed in the previous tutorial, all Crunch scripts are managed through the @git@ revision control system.
-
-First, you should do some basic configuration for git (you only need to do this the first time):
+All Crunch scripts are managed through the @git@ revision control system. Before you start using git, you should do some basic configuration (you only need to do this the first time):
<notextile>
<pre><code>~$ <span class="userinput">git config --global user.name "Your Name"</span>
~$ <span class="userinput">git config --global user.email <b>you</b>@example.com</span></code></pre>
</notextile>
-On the Arvados Workbench, navigate to "Compute %(rarr)→% Code repositories":https://{{site.arvados_workbench_host}}/repositories . You should see a repository with your user name listed in the *name* column. Next to *name* is the column *push_url*. Copy the *push_url* value associated with your repository. This should look like <notextile><code>git@git.{{ site.arvados_api_host }}:<b>you</b>.git</code></notextile>.
+On the Arvados Workbench, navigate to "Compute %(rarr)→% Code repositories":https://{{site.arvados_workbench_host}}/repositories. You should see a repository with your user name listed in the *name* column. Next to *name* is the column *push_url*. Copy the *push_url* value associated with your repository. This should look like <notextile><code>git@git.{{ site.arvados_api_host }}:<b>you</b>.git</code></notextile>.
Next, on the Arvados virtual machine, clone your git repository:
notextile. <pre><code>$ <span class="userinput">man gittutorial</span></code></pre>
-or <b>"click here to search Google for git tutorials":http://google.com/#q=git+tutorial</b>
+or *"search Google for git tutorials":http://google.com/#q=git+tutorial*.
{% include 'notebox_end' %}
h2. Creating a Crunch script
notextile. <pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">chmod +x hash.py</span></code></pre>
{% include 'notebox_begin' %}
-The steps below describe how to execute the script after committing changes to git. To run a script locally for testing, please see "debugging a crunch script":{{site.baseurl}}/user/topics/tutorial-job-debug.html .
+The steps below describe how to execute the script after committing changes to git. To run a script locally for testing, please see "debugging a crunch script":{{site.baseurl}}/user/topics/tutorial-job-debug.html.
{% include 'notebox_end' %}
-Next, add the file to @git@ staging. This tells @git@ that the file should be included on the next commit.
+Next, add the file to git staging. This tells @git@ that the file should be included on the next commit.
notextile. <pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">git add hash.py</span></code></pre>
-Next, commit your changes to git. All staged changes are recorded into the local @git@ repository:
+Next, commit your changes to git. All staged changes are recorded into the local git repository:
<notextile>
<pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">git commit -m"my first script"</span>
"dataclass": "Collection"
}
},
- "repository":"<b>you</b>",
+ "repository":"$USER",
"script_version":"master",
"output_is_persistent":true
}
</span></code></pre>
</notextile>
-* @cat@ is a standard Unix utility that simply copies standard input to standard output
-* @<<EOF@ tells the shell to direct the following lines into the standard input for @cat@ up until it sees the line @EOF@
-* @>the_pipeline@ redirects standard output to a file called @the_pipeline@
-* @"name"@ is a human-readable name for the pipeline
-* @"components"@ is a set of scripts that make up the pipeline
-* The component is listed with a human-readable name (@"do_hash"@ in this example)
-* @"script"@ specifies the name of the script to run. The script is searched for in the "crunch_scripts/" subdirectory of the @git@ checkout specified by @"script_version"@.
-* @"repository"@ is the git repository to search for the script version. You can access a list of available @git@ repositories on the Arvados workbench under "Compute %(rarr)→% Code repositories":https://{{site.arvados_workbench_host}}//repositories .
-* @"script_version"@ specifies the version of the script that you wish to run. This can be in the form of an explicit @git@ revision hash, a tag, or a branch (in which case it will take the HEAD of the specified branch). Arvados logs the script version that was used in the run, enabling you to go back and re-run any past job with the guarantee that the exact same code will be used as was used in the previous run.
+* @cat@ is a standard Unix utility that writes a sequence of input to standard output.
+* @<<EOF@ tells the shell to direct the following lines into the standard input for @cat@ up until it sees the line @EOF@.
+* @>the_pipeline@ redirects standard output to a file called @the_pipeline@.
+* @"name"@ is a human-readable name for the pipeline.
+* @"components"@ is a set of scripts that make up the pipeline.
+* The component is listed with a human-readable name (@"do_hash"@ in this example).
+* @"repository"@ is the name of a git repository to search for the script version. You can access a list of available git repositories on the Arvados Workbench under "Compute %(rarr)→% Code repositories":https://{{site.arvados_workbench_host}}/repositories. Your shell should automatically fill in @$USER@ with your login name, so that the final JSON has @"repository"@ pointed at your personal git repository.
+* @"script_version"@ specifies the version of the script that you wish to run. This can be in the form of an explicit git revision hash, a tag, or a branch (in which case it will use the HEAD of the specified branch). Arvados logs the script version that was used in the run, enabling you to go back and re-run any past job with the guarantee that the exact same code will be used as was used in the previous run.
+* @"script"@ specifies the filename of the script to run. Crunch expects to find this in the @crunch_scripts/@ subdirectory of the git repository.
* @"script_parameters"@ describes the parameters for the script. In this example, there is one parameter called @input@ which is @required@ and is a @Collection@.
* @"output_is_persistent"@ indicates whether the output of the job is considered valuable. If this value is false (or not given), the output will be treated as intermediate data and eventually deleted to reclaim disk space.
-Now, use @arv pipeline_template create@ tell Arvados about your pipeline template:
+Now, use @arv pipeline_template create@ to register your pipeline template in Arvados:
<notextile>
<pre><code>~$ <span class="userinput">arv pipeline_template create --pipeline-template "$(cat the_pipeline)"</span>
</code></pre>
</notextile>
-Your new pipeline template will appear on the "Workbench %(rarr)→% Compute %(rarr)→% Pipeline templates":https://{{ site.arvados_workbench_host }}/pipeline_instances page. You can run the "pipeline using workbench":tutorial-pipeline-workbench.html
+Your new pipeline template will appear on the Workbench "Compute %(rarr)→% Pipeline templates":https://{{ site.arvados_workbench_host }}/pipeline_instances page. You can run the "pipeline using Workbench":tutorial-pipeline-workbench.html.
The Arvados distributed file system is called *Keep*. Keep is a content-addressable file system. This means that files are managed using special unique identifiers derived from the _contents_ of the file, rather than human-assigned file names (specifically, the md5 hash). This has a number of advantages:
* Files can be stored and replicated across a cluster of servers without requiring a central name server.
-* Systematic validation of data integrity by both server and client because the checksum is built into the identifier.
-* Minimizes data duplication (two files with the same contents will result in the same identifier, and will not be stored twice.)
-* Avoids data race conditions (an identifier always points to the same data.)
+* Both the server and client systematically validate data integrity because the checksum is built into the identifier.
+* Data duplication is minimized—two files with the same contents will have in the same identifier, and will not be stored twice.
+* It avoids data race conditions, since an identifier always points to the same data.
h1. Putting Data into Keep
-We will start with downloading a freely available VCF file from the "Personal Genome Project (PGP)":http://www.personalgenomes.org subject "hu599905":https://my.personalgenomes.org/profile/hu599905 to a staging directory on the VM, and then add it to Keep.
+We will start by downloading a freely available VCF file from "Personal Genome Project (PGP)":http://www.personalgenomes.org subject "hu599905":https://my.personalgenomes.org/profile/hu599905 to a staging directory on the VM, and adding it to Keep. In the following commands, replace *@you@* with your login name.
-In the following tutorials, replace <b><code>you</code></b> with your user id.
-
-First, log into the Arvados VM instance and set up the staging area:
+First, log into your Arvados VM and set up the staging area:
notextile. <pre><code>~$ <span class="userinput">mkdir /scratch/<b>you</b></span></code></pre>
/scratch/<b>you</b>$ <span class="userinput">echo "hello bob" > tmp/bob.txt</span>
/scratch/<b>you</b>$ <span class="userinput">echo "hello carol" > tmp/carol.txt</span>
/scratch/<b>you</b>$ <span class="userinput">arv keep put tmp</span>
-0M / 0M 100.0%
+0M / 0M 100.0%
887cd41e9c613463eab2f0d885c6dd96+83
</code></pre>
</notextile>
h2. Using Workbench
-You may access collections through the "Collections section of Arvados Workbench":https://{{ site.arvados_workbench_host }}/collections located at "https://{{ site.arvados_workbench_host }}/collections":https://{{ site.arvados_workbench_host }}/collections . You can also access individual collections and individual files within a collection. Some examples:
+You may access collections through the "Collections section of Arvados Workbench":https://{{ site.arvados_workbench_host }}/collections at *Data* %(rarr)→% *Collections (data files)*. You can also access individual files within a collection. Some examples:
* "https://{{ site.arvados_workbench_host }}/collections/c1bad4b39ca5a924e481008009d94e32+210":https://{{ site.arvados_workbench_host }}/collections/c1bad4b39ca5a924e481008009d94e32+210
* "https://{{ site.arvados_workbench_host }}/collections/887cd41e9c613463eab2f0d885c6dd96+83/alice.txt":https://{{ site.arvados_workbench_host }}/collections/887cd41e9c613463eab2f0d885c6dd96+83/alice.txt
-h2(#arv-get). Using arv-get
+h2(#arv-get). Using the command line
You can view the contents of a collection using @arv keep ls@:
<notextile>
<pre><code>/scratch/<b>you</b>$ <span class="userinput">arv keep get c1bad4b39ca5a924e481008009d94e32+210/ .</span>
+/scratch/<b>you</b>$ <span class="userinput">ls var-GS000016015-ASM.tsv.bz2</span>
+var-GS000016015-ASM.tsv.bz2
</code></pre>
</notextile>
h2. Using arv-mount
-Use @arv-mount@ to take advantage of the "File System in User Space / FUSE":http://fuse.sourceforge.net/ feature of the Linux kernel to mount a Keep collection as if it were a regular directory tree.
+Use @arv-mount@ to mount a Keep collection and access it using traditional filesystem tools.
<notextile>
-<pre><code>/scratch/<b>you</b>$ <span class="userinput">mkdir mnt</span>
+<pre><code>/scratch/<b>you</b>$ <span class="userinput">mkdir -p mnt</span>
/scratch/<b>you</b>$ <span class="userinput">arv-mount --collection c1bad4b39ca5a924e481008009d94e32+210 mnt &</span>
/scratch/<b>you</b>$ <span class="userinput">cd mnt</span>
/scratch/<b>you</b>/mnt$ <span class="userinput">ls</span>
You can also mount the entire Keep namespace in "magic directory" mode:
<notextile>
-<pre><code>/scratch/<b>you</b>$ <span class="userinput">mkdir mnt</span>
+<pre><code>/scratch/<b>you</b>$ <span class="userinput">mkdir -p mnt</span>
/scratch/<b>you</b>$ <span class="userinput">arv-mount mnt &</span>
/scratch/<b>you</b>$ <span class="userinput">cd mnt/c1bad4b39ca5a924e481008009d94e32+210</span>
/scratch/<b>you</b>/mnt/c1bad4b39ca5a924e481008009d94e32+210$ <span class="userinput">ls</span>
</code></pre>
</notextile>
-Using @arv-mount@ has several significant benefits:
+@arv-mount@ provides several features:
* You can browse, open and read Keep entries as if they are regular files.
* It is easy for existing tools to access files in Keep.
-* Data is downloaded on demand, it is not necessary to download an entire file or collection to start processing
+* Data is downloaded on demand. It is not necessary to download an entire file or collection to start processing.
"dataclass": "Collection"
}
},
- "repository":"<b>you</b>",
+ "repository":"$USER",
"script_version":"master",
"output_is_persistent":false
},
- "filter":{
+ "do_filter":{
"script":"0-filter.py",
"script_parameters":{
"input":{
"output_of":"do_hash"
}
},
- "repository":"<b>you</b>",
+ "repository":"$USER",
"script_version":"master",
"output_is_persistent":true
}
</span></code></pre>
</notextile>
-* @"output_of"@ indicates that the @output@ of the @do_hash@ component should be used as the @"input"@ parameter for the @filter@ component. Arvados determines the correct order to run the jobs when such dependencies are present.
+* @"output_of"@ indicates that the @output@ of the @do_hash@ component should be used as the @"input"@ of @do_filter@. Arvados uses these dependencies between jobs to automatically determine the correct order to run them.
-Now, use @arv pipeline_template create@ tell Arvados about your pipeline template:
+(Your shell should automatically fill in @$USER@ with your login name. The JSON that gets saved should have @"repository"@ pointed at your personal git repository.)
+
+Now, use @arv pipeline_template create@ to register your pipeline template in Arvados:
<notextile>
<pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">arv pipeline_template create --pipeline-template "$(cat ~/the_pipeline)"</span>
</code></pre>
</notextile>
-Your new pipeline template will appear on the "Workbench %(rarr)→% Compute %(rarr)→% Pipeline templates":https://{{ site.arvados_workbench_host }}/pipeline_instances page.
+Your new pipeline template will appear on the Workbench "Compute %(rarr)→% Pipeline templates":https://{{ site.arvados_workbench_host }}/pipeline_instances page.
notextile. <div class="spaced-out">
-# Go to "Collections":https://{{ site.arvados_workbench_host }}/collections .
-# On the collections page, go to the search box <span class="glyphicon glyphicon-search"></span> and search for "tutorial".
-# This should yield a collection with the contents "var-GS000016015-ASM.tsv.bz2"
-# Click on the check box to the left of "var-GS000016015-ASM.tsv.bz2". This puts the collection in your persistent selection list. Click on the paperclip <span class="glyphicon glyphicon-paperclip"></span> in the upper right to get a dropdown menu listing your current selections.
-# Go to "Pipeline templates":https://{{ site.arvados_workbench_host }}/pipeline_templates .
-# Look for a pipeline named "Tutorial pipeline".
-# Click on the play button <span class="glyphicon glyphicon-play"></span> to the left of "Tutorial pipeline". This will take you to a new page to configure the pipeline.
-# Under *parameter* look for "input". Set the value of "input" by clicking on on "none" to get a editing popup. At the top of the selection list in the editing popup will be the collection that you selected in step 4.
-# You can now click on "Run pipeline" in the upper right to start the pipeline.
-# This will reload the page with the pipeline queued to run.
+# Go to "Collections":https://{{ site.arvados_workbench_host }}/collections (*Data* %(rarr)→% *Collections (data files)*).
+# On the Collections page, go to the search box <span class="glyphicon glyphicon-search"></span> and search for "tutorial".
+# The results should include a collection with the contents *var-GS000016015-ASM.tsv.bz2*.
+# Click on the check box to the left of *var-GS000016015-ASM.tsv.bz2*. This puts the collection in your persistent selection list. You can click on the paperclip <span class="glyphicon glyphicon-paperclip"></span> in the upper right to review your current selections.
+# Go to "Pipeline templates":https://{{ site.arvados_workbench_host }}/pipeline_templates (*Compute* %(rarr)→% *Pipeline templates*).
+# Look for a pipeline named *Tutorial pipeline*.
+# Click on the play button <span class="glyphicon glyphicon-play"></span> to the left of *Tutorial pipeline*. This will take you to a new page to configure the pipeline.
+# Under the *parameter* column, look for *input*. Set the value of *input* by clicking on *none* to get a selection popup. The collection that you selected in step 4 will be at the top of that pulldown menu. Select that collection in the pulldown menu.
+# You can now click on the *Run pipeline* button in the upper right to start the pipeline. A new page shows the pipeline status, queued to run.
# The page refreshes automatically every 15 seconds. You should see the pipeline running, and then finish successfully.
-# Once it is finished, click on the link under the *output* column. This will take you to the collection page for the output of this pipeline.
-# Click on "md5sum.txt" to see the actual file that is the output of this pipeline.
-# On the collection page, click on the "Provenance graph" tab to see a graphical representation of the data elements and pipelines that were involved in generating this file.
+# Once the pipeline is finished, click on the link under the *output* column. This will take you to the collection page for the output of this pipeline.
+# Click on *md5sum.txt* to see the actual file that is the output of this pipeline.
+# Go back to the collection page for the result. Click on the *Provenance graph* tab to see a graph illustrating the collections and scripts that were used to generate this file.
notextile. </div>
RUN apt-get update && \
apt-get -q -y install procps postgresql postgresql-server-dev-9.1 apache2 \
supervisor && \
- git clone git://github.com/curoverse/arvados.git /var/cache/git/arvados.git
+ git clone --bare git://github.com/curoverse/arvados.git /var/cache/git/arvados.git
RUN /bin/mkdir -p /usr/src/arvados/services
ADD generated/api.tar.gz /usr/src/arvados/services/
ENV RAILS_ENV production
ADD generated/config_databases.sh /tmp/config_databases.sh
ADD generated/superuser_token /tmp/superuser_token
-RUN sh /tmp/config_databases.sh && \
+RUN bundle install --gemfile=/usr/src/arvados/services/api/Gemfile && \
+ sh /tmp/config_databases.sh && \
rm /tmp/config_databases.sh && \
/etc/init.d/postgresql start && \
cd /usr/src/arvados/services/api && \
./script/create_superuser_token.rb $(cat /tmp/superuser_token) && \
chown www-data:www-data config.ru && \
chown www-data:www-data log -R && \
+ mkdir -p tmp && \
chown www-data:www-data tmp -R
# Configure Apache and Passenger.
# config.compute_node_nameservers = ['1.2.3.4', '1.2.3.5']
require 'net/http'
config.compute_node_nameservers = [ '@@ARVADOS_DNS_SERVER@@' ]
-
+ config.compute_node_domain = false
config.uuid_prefix = '@@API_HOSTNAME@@'
# Authentication stub: hard code pre-approved API tokens.
fi
if [[ "$2" != '' ]]; then
local name="$2"
- args="$args -name $name"
+ args="$args --name $name"
fi
if [[ "$3" != '' ]]; then
local volume="$3"
fi
if [[ "$4" != '' ]]; then
local link="$4"
- args="$args -link $link"
+ args="$args --link $link"
fi
local image=$5
# Install prerequisite packages for Arvados
# * git, curl, rvm
-# * Arvados source code in /usr/src/arvados-upstream, for preseeding gem installation
+# * Arvados source code in /usr/src/arvados, for preseeding gem installation
RUN apt-get update && \
- apt-get -q -y install -q -y openssh-server apt-utils git curl locales postgresql-server-dev-9.1 && \
+ apt-get -q -y install -q -y openssh-server apt-utils git curl \
+ libcurl3 libcurl3-gnutls libcurl4-openssl-dev locales \
+ postgresql-server-dev-9.1 && \
/bin/mkdir -p /root/.ssh && \
/bin/sed -ri 's/# en_US.UTF-8 UTF-8/en_US.UTF-8 UTF-8/' /etc/locale.gen && \
/usr/sbin/locale-gen && \
- curl -L https://get.rvm.io | bash -s stable --ruby=2.1.0 && \
- git clone https://github.com/curoverse/arvados.git /usr/src/arvados-upstream
+ curl -L https://get.rvm.io | bash -s stable && \
+ /usr/local/rvm/bin/rvm install 2.1.0 && \
+ /bin/mkdir -p /usr/src/arvados
+
+ADD generated/arvados.tar.gz /usr/src/arvados/
# Set up RVM environment. These are just the env variables created by
# /usr/local/rvm/scripts/rvm, which can't be run from a non-login shell.
# https://github.com/rubygems/rubygems.org/issues/613.
RUN gem update --system && \
gem install bundler && \
- bundle install --gemfile=/usr/src/arvados-upstream/apps/workbench/Gemfile && \
- bundle install --gemfile=/usr/src/arvados-upstream/services/api/Gemfile && \
- bundle install --gemfile=/usr/src/arvados-upstream/doc/Gemfile
+ bundle install --gemfile=/usr/src/arvados/apps/workbench/Gemfile && \
+ bundle install --gemfile=/usr/src/arvados/services/api/Gemfile && \
+ bundle install --gemfile=/usr/src/arvados/doc/Gemfile
ADD generated/id_rsa.pub /root/.ssh/authorized_keys
RUN chown root:root /root/.ssh/authorized_keys
#! /bin/bash
-build_ok=true
-
-# Check that:
-# * IP forwarding is enabled in the kernel.
-
-if [ "$(/sbin/sysctl --values net.ipv4.ip_forward)" != "1" ]
-then
- echo >&2 "WARNING: IP forwarding must be enabled in the kernel."
- echo >&2 "Try: sudo sysctl net.ipv4.ip_forward=1"
- build_ok=false
-fi
-
-# * Docker can be found in the user's path
-# * The user is in the docker group
-# * cgroup is mounted
-# * the docker daemon is running
-
-if ! docker images > /dev/null 2>&1
+# make sure Ruby 1.9.3 is installed before proceeding
+if ! ruby -e 'exit RUBY_VERSION >= "1.9.3"' 2>/dev/null
then
- echo >&2 "WARNING: docker could not be run."
- echo >&2 "Please make sure that:"
- echo >&2 " * You have permission to read and write /var/run/docker.sock"
- echo >&2 " * a 'cgroup' volume is mounted on your machine"
- echo >&2 " * the docker daemon is running"
- build_ok=false
-fi
+ echo "Installing Arvados requires at least Ruby 1.9.3."
+ echo "You may need to enter your password."
+ read -p "Press Ctrl-C to abort, or else press ENTER to install ruby1.9.3 and continue. " unused
-# * config.yml exists
-if [ '!' -f config.yml ]
-then
- echo >&2 "WARNING: no config.yml found in the current directory"
- echo >&2 "Copy config.yml.example to config.yml and update it with settings for your site."
- build_ok=false
+ sudo apt-get update
+ sudo apt-get -y install ruby1.9.3
fi
-# If ok to build, then go ahead and run make
-if $build_ok
-then
- make $*
-fi
+build_tools/build.rb $*
# `make clean' removes the files generated in the build directory
# but does not remove any docker images generated in previous builds
clean:
+ -rm -rf build
-rm *-image */generated/*
-@rmdir */generated
# Dependencies for */generated files which are prerequisites
# for building docker images.
+CONFIG_RB = build_tools/config.rb
+
+BUILD = build/.buildstamp
+
BASE_DEPS = base/Dockerfile $(BASE_GENERATED)
API_DEPS = api/Dockerfile $(API_GENERATED)
SSO_DEPS = sso/passenger.conf $(SSO_GENERATED)
-BASE_GENERATED = base/generated
+BASE_GENERATED = base/generated/arvados.tar.gz
API_GENERATED = \
api/generated/apache2_vhost \
sso/seeds.rb.in \
sso/secret_token.rb.in
-$(BASE_GENERATED): config.yml
- ./config.rb
+$(BUILD):
+ mkdir -p build
+ rsync -rlp --exclude=docker/ --exclude='**/log/*' --exclude='**/tmp/*' \
+ --chmod=Da+rx,Fa+rX ../ build/
+ touch build/.buildstamp
+
+$(BASE_GENERATED): config.yml $(BUILD)
+ $(CONFIG_RB)
+ mkdir -p base/generated
+ tar -czf base/generated/arvados.tar.gz -C build .
$(API_GENERATED): config.yml $(API_GENERATED_IN)
- ./config.rb
+ $(CONFIG_RB)
$(WORKBENCH_GENERATED): config.yml $(WORKBENCH_GENERATED_IN)
- ./config.rb
+ $(CONFIG_RB)
$(WAREHOUSE_GENERATED): config.yml $(WAREHOUSE_GENERATED_IN)
- ./config.rb
+ $(CONFIG_RB)
$(SSO_GENERATED): config.yml $(SSO_GENERATED_IN)
- ./config.rb
+ $(CONFIG_RB)
# The docker build -q option suppresses verbose build output.
# Necessary to prevent failure on building warehouse; see
# ============================================================
# The main Arvados servers: api, doc, workbench, warehouse
-api-image: passenger-image $(API_DEPS)
+api-image: passenger-image $(BUILD) $(API_DEPS)
mkdir -p api/generated
- tar -c -z -f api/generated/api.tar.gz -C ../services api
+ tar -czf api/generated/api.tar.gz -C build/services api
$(DOCKER_BUILD) -t arvados/api api
- echo -n "Built at $(date)" > api-image
+ date >api-image
-doc-image: base-image $(DOC_DEPS)
+doc-image: base-image $(BUILD) $(DOC_DEPS)
mkdir -p doc/generated
- tar -c -z -f doc/generated/doc.tar.gz -C .. doc
+ tar -czf doc/generated/doc.tar.gz -C build doc
$(DOCKER_BUILD) -t arvados/doc doc
- echo -n "Built at $(date)" > doc-image
+ date >doc-image
-workbench-image: passenger-image $(WORKBENCH_DEPS)
+workbench-image: passenger-image $(BUILD) $(WORKBENCH_DEPS)
mkdir -p workbench/generated
- tar -c -z -f workbench/generated/workbench.tar.gz -C ../apps workbench
+ tar -czf workbench/generated/workbench.tar.gz -C build/apps workbench
$(DOCKER_BUILD) -t arvados/workbench workbench
- echo -n "Built at $(date)" > workbench-image
+ date >workbench-image
warehouse-image: base-image $(WAREHOUSE_DEPS)
$(DOCKER_BUILD) -t arvados/warehouse warehouse
- echo -n "Built at $(date)" > warehouse-image
+ date >warehouse-image
sso-image: passenger-image $(SSO_DEPS)
$(DOCKER_BUILD) -t arvados/sso sso
- echo -n "Built at $(date)" > sso-image
+ date >sso-image
# ============================================================
# The arvados/base image is the base Debian image plus packages
passenger-image: base-image
$(DOCKER_BUILD) -t arvados/passenger passenger
- echo -n "Built at $(date)" > passenger-image
+ date >passenger-image
base-image: debian-image $(BASE_DEPS)
$(DOCKER_BUILD) -t arvados/base base
- echo -n "Built at $(date)" > base-image
+ date >base-image
debian-image:
./mkimage-debootstrap.sh arvados/debian wheezy ftp://ftp.us.debian.org/debian/
- echo -n "Built at $(date)" > debian-image
-
+ date >debian-image
--- /dev/null
+#! /usr/bin/env ruby
+
+require 'optparse'
+require 'tempfile'
+require 'yaml'
+
+def main options
+ if not ip_forwarding_enabled?
+ warn "NOTE: IP forwarding must be enabled in the kernel."
+ warn "Turning IP forwarding on now."
+ sudo %w(/sbin/sysctl net.ipv4.ip_forward=1)
+ end
+
+ # Check that:
+ # * Docker is installed and can be found in the user's path
+ # * Docker can be run as a non-root user
+ # - TODO: put the user is in the docker group if necessary
+ # - TODO: mount cgroup automatically
+ # - TODO: start the docker service if not started
+
+ docker_path = %x(which docker).chomp
+ if docker_path.empty?
+ warn "Docker not found."
+ warn ""
+ warn "Please make sure that Docker has been installed and"
+ warn "can be found in your PATH."
+ warn ""
+ warn "Installation instructions for a variety of platforms can be found at"
+ warn "http://docs.docker.io/en/latest/installation/"
+ exit
+ elsif not docker_ok?
+ warn "WARNING: docker could not be run."
+ warn "Please make sure that:"
+ warn " * You have permission to read and write /var/run/docker.sock"
+ warn " * a 'cgroup' volume is mounted on your machine"
+ warn " * the docker daemon is running"
+ exit
+ end
+
+ # Check that debootstrap is installed.
+ if not debootstrap_ok?
+ warn "Installing debootstrap."
+ sudo '/usr/bin/apt-get', 'install', 'debootstrap'
+ end
+
+ # Generate a config.yml if it does not exist or is empty
+ if not File.size? 'config.yml'
+ print "Generating config.yml.\n"
+ print "Arvados needs to know the email address of the administrative user,\n"
+ print "so that when that user logs in they are automatically made an admin.\n"
+ print "This should be the email address you use to log in to Google.\n"
+ print "\n"
+ admin_email_address = ""
+ until is_valid_email? admin_email_address
+ print "Enter your Google ID email address here: "
+ admin_email_address = gets.strip
+ if not is_valid_email? admin_email_address
+ print "That doesn't look like a valid email address. Please try again.\n"
+ end
+ end
+
+ File.open 'config.yml', 'w' do |config_out|
+ config = YAML.load_file 'config.yml.example'
+ config['API_AUTO_ADMIN_USER'] = admin_email_address
+ config['API_HOSTNAME'] = generate_api_hostname
+ config['PUBLIC_KEY_PATH'] = find_or_create_ssh_key(config['API_HOSTNAME'])
+ config.each_key do |var|
+ if var.end_with?('_PW') or var.end_with?('_SECRET')
+ config[var] = rand(2**256).to_s(36)
+ end
+ config_out.write "#{var}: #{config[var]}\n"
+ end
+ end
+ end
+
+ # If all prerequisites are met, go ahead and build.
+ if ip_forwarding_enabled? and
+ docker_ok? and
+ debootstrap_ok? and
+ File.exists? 'config.yml'
+ warn "Building Arvados."
+ system '/usr/bin/make', '-f', options[:makefile], *ARGV
+ end
+end
+
+# sudo
+# Execute the arg list 'cmd' under sudo.
+# cmd can be passed either as a series of arguments or as a
+# single argument consisting of a list, e.g.:
+# sudo 'apt-get', 'update'
+# sudo(['/usr/bin/gpasswd', '-a', ENV['USER'], 'docker'])
+# sudo %w(/usr/bin/apt-get install lxc-docker)
+#
+def sudo(*cmd)
+ # user can pass a single list in as an argument
+ # to allow usage like: sudo %w(apt-get install foo)
+ warn "You may need to enter your password here."
+ if cmd.length == 1 and cmd[0].class == Array
+ cmd = cmd[0]
+ end
+ system '/usr/bin/sudo', *cmd
+end
+
+# is_valid_email?
+# Returns true if its arg looks like a valid email address.
+# This is a very very loose sanity check.
+#
+def is_valid_email? str
+ str.match /^\S+@\S+\.\S+$/
+end
+
+# generate_api_hostname
+# Generates a 5-character randomly chosen API hostname.
+#
+def generate_api_hostname
+ rand(2**256).to_s(36)[0...5]
+end
+
+# ip_forwarding_enabled?
+# Returns 'true' if IP forwarding is enabled in the kernel
+#
+def ip_forwarding_enabled?
+ %x(/sbin/sysctl -n net.ipv4.ip_forward) == "1\n"
+end
+
+# debootstrap_ok?
+# Returns 'true' if debootstrap is installed and working.
+#
+def debootstrap_ok?
+ return system '/usr/sbin/debootstrap --version > /dev/null 2>&1'
+end
+
+# docker_ok?
+# Returns 'true' if docker can be run as the current user.
+#
+def docker_ok?
+ return system 'docker images > /dev/null 2>&1'
+end
+
+# find_or_create_ssh_key arvados_name
+# Returns the SSH public key appropriate for this Arvados instance,
+# generating one if necessary.
+#
+def find_or_create_ssh_key arvados_name
+ ssh_key_file = "#{ENV['HOME']}/.ssh/arvados_#{arvados_name}_id_rsa"
+ unless File.exists? ssh_key_file
+ system 'ssh-keygen',
+ '-f', ssh_key_file,
+ '-C', "arvados@#{arvados_name}",
+ '-P', ''
+ end
+
+ return "#{ssh_key_file}.pub"
+end
+
+# install_docker
+# Determines which Docker package is suitable for this Linux distro
+# and installs it, resolving any dependencies.
+# NOTE: not in use yet.
+
+def install_docker
+ linux_distro = %x(lsb_release --id).split.last
+ linux_release = %x(lsb_release --release).split.last
+ linux_version = linux_distro + " " + linux_release
+ kernel_release = `uname -r`
+
+ case linux_distro
+ when 'Ubuntu'
+ if not linux_release.match '^1[234]\.'
+ warn "Arvados requires at least Ubuntu 12.04 (Precise Pangolin)."
+ warn "Your system is Ubuntu #{linux_release}."
+ exit
+ end
+ if linux_release.match '^12' and kernel_release.start_with? '3.2'
+ # Ubuntu Precise ships with a 3.2 kernel and must be upgraded.
+ warn "Your kernel #{kernel_release} must be upgraded to run Docker."
+ warn "To do this:"
+ warn " sudo apt-get update"
+ warn " sudo apt-get install linux-image-generic-lts-raring linux-headers-generic-lts-raring"
+ warn " sudo reboot"
+ exit
+ else
+ # install AUFS
+ sudo 'apt-get', 'update'
+ sudo 'apt-get', 'install', "linux-image-extra-#{kernel_release}"
+ end
+
+ # add Docker repository
+ sudo %w(/usr/bin/apt-key adv
+ --keyserver keyserver.ubuntu.com
+ --recv-keys 36A1D7869245C8950F966E92D8576A8BA88D21E9)
+ source_file = Tempfile.new('arv')
+ source_file.write("deb http://get.docker.io/ubuntu docker main\n")
+ source_file.close
+ sudo '/bin/mv', source_file.path, '/etc/apt/sources.list.d/docker.list'
+ sudo %w(/usr/bin/apt-get update)
+ sudo %w(/usr/bin/apt-get install lxc-docker)
+
+ # Set up for non-root access
+ sudo %w(/usr/sbin/groupadd docker)
+ sudo '/usr/bin/gpasswd', '-a', ENV['USER'], 'docker'
+ sudo %w(/usr/sbin/service docker restart)
+ when 'Debian'
+ else
+ warn "Must be running a Debian or Ubuntu release in order to run Docker."
+ exit
+ end
+end
+
+
+if __FILE__ == $PROGRAM_NAME
+ options = { :makefile => File.join(File.dirname(__FILE__), 'Makefile') }
+ OptionParser.new do |opts|
+ opts.on('-m', '--makefile MAKEFILE-PATH',
+ 'Path to the Makefile used to build Arvados Docker images') do |mk|
+ options[:makefile] = mk
+ end
+ end
+
+ main options
+end
# For each *.in file in the docker directories, substitute any
# @@variables@@ found in the file with the appropriate config
# variable. Support up to 10 levels of nesting.
-#
+#
# TODO(twp): add the *.in files directory to the source tree, and
# when expanding them, add them to the "generated" directory with
# the same tree structure as in the original source. Then all
File.delete(stale_file)
end
+File.umask(022)
Dir.glob('*/*.in') do |template_file|
generated_dir = File.join(File.dirname(template_file), 'generated')
Dir.mkdir(generated_dir) unless Dir.exists? generated_dir
output_path = File.join(generated_dir, File.basename(template_file, '.in'))
- output = File.open(output_path, "w")
- File.open(template_file) do |input|
- input.each_line do |line|
+ File.open(output_path, "w") do |output|
+ File.open(template_file) do |input|
+ input.each_line do |line|
- @count = 0
- while @count < 10
- @out = line.gsub!(/@@(.*?)@@/) do |var|
- if config.key?(Regexp.last_match[1])
- config[Regexp.last_match[1]]
- else
- var.gsub!(/@@/, '@_NOT_FOUND_@')
+ # This count is used to short-circuit potential
+ # infinite loops of variable substitution.
+ @count = 0
+ while @count < 10
+ @out = line.gsub!(/@@(.*?)@@/) do |var|
+ if config.key?(Regexp.last_match[1])
+ config[Regexp.last_match[1]]
+ else
+ var.gsub!(/@@/, '@_NOT_FOUND_@')
+ end
end
+ break if @out.nil?
+ @count += 1
end
- break if @out.nil?
- @count += 1
- end
- output.write(line)
+ output.write(line)
+ end
end
end
- output.close
end
# Copy the ssh public key file to base/generated (if a path is given)
generated_dir = File.join('base/generated')
Dir.mkdir(generated_dir) unless Dir.exists? generated_dir
-if config.key?('PUBLIC_KEY_PATH') &&
- ! (config['PUBLIC_KEY_PATH'] == '') &&
- File.readable?(config['PUBLIC_KEY_PATH'])
+if (!config['PUBLIC_KEY_PATH'].nil? and
+ File.readable? config['PUBLIC_KEY_PATH'])
FileUtils.cp(config['PUBLIC_KEY_PATH'],
File.join(generated_dir, 'id_rsa.pub'))
end
ADD generated/doc.tar.gz /usr/src/arvados/
# Build static site
-RUN /bin/sed -ri 's/^baseurl: .*$/baseurl: /' /usr/src/arvados/doc/_config.yml && \
+RUN bundle install --gemfile=/usr/src/arvados/doc/Gemfile && \
+ /bin/sed -ri 's/^baseurl: .*$/baseurl: /' /usr/src/arvados/doc/_config.yml && \
cd /usr/src/arvados/doc && \
LANG="en_US.UTF-8" LC_ALL="en_US.UTF-8" rake
+++ /dev/null
-#! /bin/bash
-
-# Wrapper script for `docker build'.
-# This is a workaround for https://github.com/dotcloud/docker/issues/1875.
-
-tmpfile=$(mktemp)
-trap "rm $tmpfile; exit 1" SIGHUP SIGINT SIGTERM
-
-docker build $* | tee ${tmpfile}
-if $(grep -q 'Error build' ${tmpfile})
-then
- result=1
-else
- result=0
-fi
-
-rm $tmpfile
-exit $result
--- /dev/null
+#! /bin/sh
+
+# Install prerequisites.
+sudo apt-get install curl libcurl3 libcurl3-gnutls libcurl4-openssl-dev python-pip
+
+# Install RVM.
+curl -sSL https://get.rvm.io | bash -s stable
+source ~/.rvm/scripts/rvm
+rvm install 2.1.0
+
+# Install arvados-cli.
+gem install arvados-cli
+sudo pip install --upgrade httplib2
# Update Arvados source
RUN /bin/mkdir -p /usr/src/arvados/apps
ADD generated/workbench.tar.gz /usr/src/arvados/apps/
+ADD generated/secret_token.rb /usr/src/arvados/apps/workbench/config/initializers/secret_token.rb
+ADD generated/production.rb /usr/src/arvados/apps/workbench/config/environments/production.rb
+ADD passenger.conf /etc/apache2/conf.d/passenger
+
-RUN touch /usr/src/arvados/apps/workbench/log/production.log && \
+RUN bundle install --gemfile=/usr/src/arvados/apps/workbench/Gemfile && \
+ touch /usr/src/arvados/apps/workbench/log/production.log && \
chmod 666 /usr/src/arvados/apps/workbench/log/production.log && \
touch /usr/src/arvados/apps/workbench/db/production.sqlite3 && \
bundle install --gemfile=/usr/src/arvados/apps/workbench/Gemfile && \
cd /usr/src/arvados/apps/workbench && \
- rake assets:precompile
+ rake assets:precompile && \
+ chown -R www-data:www-data /usr/src/arvados/apps/workbench
# Configure Apache
ADD generated/apache2_vhost /etc/apache2/sites-available/workbench
a2ensite workbench && \
a2enmod rewrite
-# Set up the production environment
-ADD generated/secret_token.rb /usr/src/arvados/apps/workbench/config/initializers/secret_token.rb
-ADD generated/production.rb /usr/src/arvados/apps/workbench/config/environments/production.rb
-ADD passenger.conf /etc/apache2/conf.d/passenger
-
ADD apache2_foreground.sh /etc/apache2/foreground.sh
# Start Apache
use Fcntl qw(F_GETFL F_SETFL O_NONBLOCK);
use Arvados;
use Getopt::Long;
-use Warehouse;
-use Warehouse::Stream;
-use IPC::System::Simple qw(capturex);
+use IPC::Open2;
+use IO::Select;
+use File::Temp;
use Fcntl ':flock';
$ENV{"TMPDIR"} ||= "/tmp";
}
$job_id = $Job->{'uuid'};
-$metastream = Warehouse::Stream->new(whc => new Warehouse);
-$metastream->clear;
-$metastream->name('.');
-$metastream->write_start($job_id . '.log.txt');
-
+my $keep_logfile = $job_id . '.log.txt';
+my $local_logfile = File::Temp->new();
$Job->{'runtime_constraints'} ||= {};
$Job->{'runtime_constraints'}->{'max_tasks_per_node'} ||= 0;
$ENV{"TASK_SLOT_NODE"} = $slot[$childslot]->{node}->{name};
$ENV{"TASK_SLOT_NUMBER"} = $slot[$childslot]->{cpu};
$ENV{"TASK_WORK"} = $ENV{"JOB_WORK"}."/$id.$$";
- $ENV{"TASK_KEEPMOUNT"} = $ENV{"TASK_WORK"}."/keep";
+ $ENV{"TASK_KEEPMOUNT"} = $ENV{"TASK_WORK"}.".keep";
$ENV{"TASK_TMPDIR"} = $ENV{"TASK_WORK"}; # deprecated
$ENV{"CRUNCH_NODE_SLOTS"} = $slot[$childslot]->{node}->{ncpus};
$ENV{"PATH"} = $ENV{"CRUNCH_INSTALL"} . "/bin:" . $ENV{"PATH"};
if ($Job->{'output'})
{
eval {
- my $manifest_text = capturex("whget", $Job->{'output'});
+ my $manifest_text = `arv keep get \Q$Job->{'output'}\E`;
$arv->{'collections'}->{'create'}->execute('collection' => {
'uuid' => $Job->{'output'},
'manifest_text' => $manifest_text,
} split ("\n", $jobstep[$job]->{stderr});
}
+sub fetch_block
+{
+ my $hash = shift;
+ my ($keep, $child_out, $output_block);
+
+ my $cmd = "arv keep get \Q$hash\E";
+ open($keep, '-|', $cmd) or die "fetch_block: $cmd: $!";
+ sysread($keep, $output_block, 64 * 1024 * 1024);
+ close $keep;
+ return $output_block;
+}
sub collate_output
{
- my $whc = Warehouse->new;
Log (undef, "collate");
- $whc->write_start (1);
+
+ my ($child_out, $child_in);
+ my $pid = open2($child_out, $child_in, 'arv', 'keep', 'put', '--raw');
my $joboutput;
for (@jobstep)
{
if ($output !~ /^[0-9a-f]{32}(\+\S+)*$/)
{
$output_in_keep ||= $output =~ / [0-9a-f]{32}\S*\+K/;
- $whc->write_data ($output);
+ print $child_in $output;
}
elsif (@jobstep == 1)
{
$joboutput = $output;
- $whc->write_finish;
+ last;
}
- elsif (defined (my $outblock = $whc->fetch_block ($output)))
+ elsif (defined (my $outblock = fetch_block ($output)))
{
$output_in_keep ||= $outblock =~ / [0-9a-f]{32}\S*\+K/;
- $whc->write_data ($outblock);
+ print $child_in $outblock;
}
else
{
- my $errstr = $whc->errstr;
- $whc->write_data ("XXX fetch_block($output) failed: $errstr XXX\n");
+ Log (undef, "XXX fetch_block($output) failed XXX");
$main::success = 0;
}
}
- $joboutput = $whc->write_finish if !defined $joboutput;
+ $child_in->close;
+
+ if (!defined $joboutput) {
+ my $s = IO::Select->new($child_out);
+ if ($s->can_read(120)) {
+ sysread($child_out, $joboutput, 64 * 1024 * 1024);
+ } else {
+ Log (undef, "timed out reading from 'arv keep put'");
+ }
+ }
+ waitpid($pid, 0);
+
if ($joboutput)
{
Log (undef, "output $joboutput");
}
print STDERR ((-t STDERR) ? ($datetime." ".$message) : $message);
- return if !$metastream;
- $metastream->write_data ($datetime . " " . $message);
+ if ($metastream) {
+ print $metastream $datetime . " " . $message;
+ }
}
sub save_meta
{
my $justcheckpoint = shift; # false if this will be the last meta saved
- my $m = $metastream;
- $m = $m->copy if $justcheckpoint;
- $m->write_finish;
- my $whc = Warehouse->new;
- my $loglocator = $whc->store_block ($m->as_string);
- $arv->{'collections'}->{'create'}->execute('collection' => {
- 'uuid' => $loglocator,
- 'manifest_text' => $m->as_string,
- });
- undef $metastream if !$justcheckpoint; # otherwise Log() will try to use it
+ return if $justcheckpoint; # checkpointing is not relevant post-Warehouse.pm
+
+ $local_logfile->flush;
+ my $cmd = "arv keep put --filename \Q$keep_logfile\E "
+ . quotemeta($local_logfile->filename);
+ my $loglocator = `$cmd`;
+ die "system $cmd failed: $?" if $?;
+
+ $local_logfile = undef; # the temp file is automatically deleted
Log (undef, "log manifest is $loglocator");
$Job->{'log'} = $loglocator;
$Job->update_attributes('log', $loglocator) if $job_has_uuid;
sub thaw
{
croak ("Thaw not implemented");
-
- my $whc;
- my $key = shift;
- Log (undef, "thaw from $key");
-
- @jobstep = ();
- @jobstep_done = ();
- @jobstep_todo = ();
- @jobstep_tomerge = ();
- $jobstep_tomerge_level = 0;
- my $frozenjob = {};
-
- my $stream = new Warehouse::Stream ( whc => $whc,
- hash => [split (",", $key)] );
- $stream->rewind;
- while (my $dataref = $stream->read_until (undef, "\n\n"))
- {
- if ($$dataref =~ /^job /)
- {
- foreach (split ("\n", $$dataref))
- {
- my ($k, $v) = split ("=", $_, 2);
- $frozenjob->{$k} = freezeunquote ($v);
- }
- next;
- }
-
- if ($$dataref =~ /^merge (\d+) (.*)/)
- {
- $jobstep_tomerge_level = $1;
- @jobstep_tomerge
- = map { freezeunquote ($_) } split ("\n", freezeunquote($2));
- next;
- }
-
- my $Jobstep = { };
- foreach (split ("\n", $$dataref))
- {
- my ($k, $v) = split ("=", $_, 2);
- $Jobstep->{$k} = freezeunquote ($v) if $k;
- }
- $Jobstep->{'failures'} = 0;
- push @jobstep, $Jobstep;
-
- if ($Jobstep->{exitcode} eq "0")
- {
- push @jobstep_done, $#jobstep;
- }
- else
- {
- push @jobstep_todo, $#jobstep;
- }
- }
-
- foreach (qw (script script_version script_parameters))
- {
- $Job->{$_} = $frozenjob->{$_};
- }
- $Job->save if $job_has_uuid;
}
--- /dev/null
+#! /usr/bin/perl
+
+use strict;
+
+use ExtUtils::MakeMaker;
+
+WriteMakefile(
+ NAME => 'Arvados',
+ VERSION_FROM => 'lib/Arvados.pm'
+);
cond_out = []
param_out = []
@filters.each do |attr, operator, operand|
- if !model_class.searchable_columns.index attr.to_s
+ if !model_class.searchable_columns(operator).index attr.to_s
raise ArgumentError.new("Invalid attribute '#{attr}' in condition")
end
case operator.downcase
if @where.is_a? Hash and @where.any?
conditions = ['1=1']
@where.each do |attr,value|
- if attr == :any
+ if attr.to_s == 'any'
if value.is_a?(Array) and
value.length == 2 and
- value[0] == 'contains' and
- model_class.columns.collect(&:name).index('name') then
+ value[0] == 'contains' then
ilikes = []
- model_class.searchable_columns.each do |column|
+ model_class.searchable_columns('ilike').each do |column|
ilikes << "#{table_name}.#{column} ilike ?"
conditions << "%#{value[1]}%"
end
:items => @objects.as_api_response(nil)
}
if @objects.respond_to? :except
- @object_list[:items_available] = @objects.except(:limit).except(:offset).count
+ @object_list[:items_available] = @objects.
+ except(:limit).except(:offset).
+ count(:id, distinct: true)
end
render json: @object_list
end
"#{current_api_base}/#{self.class.to_s.pluralize.underscore}/#{self.uuid}"
end
- def self.searchable_columns
+ def self.searchable_columns operator
+ textonly_operator = !operator.match(/[<=>]/)
self.columns.collect do |col|
- if [:string, :text, :datetime, :integer].index(col.type) && col.name != 'owner_uuid'
+ if col.name == 'owner_uuid'
+ nil
+ elsif [:string, :text].index(col.type)
+ col.name
+ elsif !textonly_operator and [:datetime, :integer].index(col.type)
col.name
end
end.compact
before_update :prevent_privilege_escalation
before_update :prevent_inactive_admin
before_create :check_auto_admin
+ after_create :add_system_group_permission_link
after_create AdminNotifier
has_many :authorized_keys, :foreign_key => :authorized_user_uuid, :primary_key => :uuid
Group.where('owner_uuid in (?)', lookup_uuids).each do |group|
newgroups << [group.owner_uuid, group.uuid, 'can_manage']
end
- Link.where('tail_uuid in (?) and link_class = ? and head_kind = ?',
+ Link.where('tail_uuid in (?) and link_class = ? and head_kind in (?)',
lookup_uuids,
'permission',
- 'arvados#group').each do |link|
+ ['arvados#group', 'arvados#user']).each do |link|
newgroups << [link.tail_uuid, link.head_uuid, link.name]
end
newgroups.each do |tail_uuid, head_uuid, perm_name|
end
end
+ # Give the special "System group" permission to manage this user and
+ # all of this user's stuff.
+ #
+ def add_system_group_permission_link
+ Link.create(link_class: 'permission',
+ name: 'can_manage',
+ tail_kind: 'arvados#group',
+ tail_uuid: system_group_uuid,
+ head_kind: 'arvados#user',
+ head_uuid: self.uuid)
+ end
end
--- /dev/null
+class AddSystemGroup < ActiveRecord::Migration
+ include CurrentApiClient
+
+ def up
+ # Make sure the system group exists.
+ system_group
+ end
+
+ def down
+ act_as_system_user do
+ system_group.destroy
+
+ # Destroy the automatically generated links giving system_group
+ # permission on all users.
+ Link.destroy_all(tail_uuid: system_group_uuid, head_kind: 'arvados#user')
+ end
+ end
+end
#
# It's strongly recommended to check this file into your version control system.
-ActiveRecord::Schema.define(:version => 20140324024606) do
+ActiveRecord::Schema.define(:version => 20140402001908) do
create_table "api_client_authorizations", :force => true do |t|
t.string "api_token", :null => false
-# This file should contain all the record creation needed to seed the database with its default values.
-# The data can then be loaded with the rake db:seed (or created alongside the db with db:setup).
+# This file seeds the database with initial/default values.
#
-# Examples:
-#
-# cities = City.create([{ :name => 'Chicago' }, { :name => 'Copenhagen' }])
-# Mayor.create(:name => 'Emanuel', :city => cities.first)
+# It is invoked by `rake db:seed` and `rake db:setup`.
+
+# These two methods would create the system user and group objects on
+# demand later anyway, but it's better form to create them up front.
+include CurrentApiClient
+system_user
+system_group
'000000000000000'].join('-')
end
+ def system_group_uuid
+ [Server::Application.config.uuid_prefix,
+ Group.uuid_prefix,
+ '000000000000000'].join('-')
+ end
+
def system_user
if not $system_user
real_current_user = Thread.current[:user]
$system_user
end
+ def system_group
+ if not $system_group
+ act_as_system_user do
+ ActiveRecord::Base.transaction do
+ $system_group = Group.
+ where(uuid: system_group_uuid).first_or_create do |g|
+ g.update_attributes(name: "System group",
+ description: "System group")
+ User.all.collect(&:uuid).each do |user_uuid|
+ Link.create(link_class: 'permission',
+ name: 'can_manage',
+ tail_kind: 'arvados#group',
+ tail_uuid: system_group_uuid,
+ head_kind: 'arvados#user',
+ head_uuid: user_uuid)
+ end
+ end
+ end
+ end
+ end
+ $system_group
+ end
+
def act_as_system_user
if block_given?
user_was = Thread.current[:user]
api_token: 1a9ffdcga2o7cw8q12dndskomgs1ygli3ns9k2o9hgzgmktc78
expires_at: 2038-01-01 00:00:00
+miniadmin:
+ api_client: untrusted
+ user: miniadmin
+ api_token: 2zb2y9pw3e70270te7oe3ewaantea3adyxjascvkz0zob7q7xb
+ expires_at: 2038-01-01 00:00:00
+
+rominiadmin:
+ api_client: untrusted
+ user: rominiadmin
+ api_token: 5tsb2pc3zlatn1ortl98s2tqsehpby88wmmnzmpsjmzwa6payh
+ expires_at: 2038-01-01 00:00:00
+
active:
api_client: untrusted
user: active
name: Private
description: Private Group
+private_and_can_read_foofile:
+ uuid: zzzzz-j7d0g-22xp1wpjul508rk
+ owner_uuid: zzzzz-tpzed-xurymjxw79nv3jz
+ name: Private and Can Read Foofile
+ description: Another Private Group
+
system_owned_group:
uuid: zzzzz-j7d0g-8ulrifv67tve5sx
owner_uuid: zzzzz-tpzed-000000000000000
uuid: zzzzz-j7d0g-fffffffffffffff
owner_uuid: zzzzz-tpzed-d9tiejq69daie8f
name: All users
+
+testusergroup_admins:
+ uuid: zzzzz-j7d0g-48foin4vonvc2at
+ owner_uuid: zzzzz-tpzed-000000000000000
+ name: Administrators of a subset of users
head_uuid: 1f4b0bc7583c2a7f9102c395f4ffc5e3+45
properties: {}
+foo_file_readable_by_active_duplicate_permission:
+ uuid: zzzzz-o0j2j-2qlmhgothiur55r
+ owner_uuid: zzzzz-tpzed-000000000000000
+ created_at: 2014-01-24 20:42:26 -0800
+ modified_by_client_uuid: zzzzz-ozdt8-000000000000000
+ modified_by_user_uuid: zzzzz-tpzed-000000000000000
+ modified_at: 2014-01-24 20:42:26 -0800
+ updated_at: 2014-01-24 20:42:26 -0800
+ tail_kind: arvados#user
+ tail_uuid: zzzzz-tpzed-xurymjxw79nv3jz
+ link_class: permission
+ name: can_read
+ head_kind: arvados#collection
+ head_uuid: 1f4b0bc7583c2a7f9102c395f4ffc5e3+45
+ properties: {}
+
+foo_file_readable_by_active_redundant_permission_via_private_group:
+ uuid: zzzzz-o0j2j-5s8ry7sn6bwxb7w
+ owner_uuid: zzzzz-tpzed-000000000000000
+ created_at: 2014-01-24 20:42:26 -0800
+ modified_by_client_uuid: zzzzz-ozdt8-000000000000000
+ modified_by_user_uuid: zzzzz-tpzed-000000000000000
+ modified_at: 2014-01-24 20:42:26 -0800
+ updated_at: 2014-01-24 20:42:26 -0800
+ tail_kind: arvados#group
+ tail_uuid: zzzzz-j7d0g-22xp1wpjul508rk
+ link_class: permission
+ name: can_read
+ head_kind: arvados#collection
+ head_uuid: 1f4b0bc7583c2a7f9102c395f4ffc5e3+45
+ properties: {}
+
bar_file_readable_by_active:
uuid: zzzzz-o0j2j-8hppiuduf8eqdng
owner_uuid: zzzzz-tpzed-000000000000000
head_kind: arvados#repository
head_uuid: zzzzz-2x53u-382brsig8rp3666
properties: {}
+
+miniadmin_user_is_a_testusergroup_admin:
+ uuid: zzzzz-o0j2j-38vvkciz7qc12j9
+ owner_uuid: zzzzz-tpzed-000000000000000
+ created_at: 2014-04-01 13:53:33 -0400
+ modified_by_client_uuid: zzzzz-ozdt8-brczlopd8u8d0jr
+ modified_by_user_uuid: zzzzz-tpzed-000000000000000
+ modified_at: 2014-04-01 13:53:33 -0400
+ updated_at: 2014-04-01 13:53:33 -0400
+ tail_kind: arvados#user
+ tail_uuid: zzzzz-tpzed-2bg9x0oeydcw5hm
+ link_class: permission
+ name: can_manage
+ head_kind: arvados#group
+ head_uuid: zzzzz-j7d0g-48foin4vonvc2at
+ properties: {}
+
+rominiadmin_user_is_a_testusergroup_admin:
+ uuid: zzzzz-o0j2j-6b0hz5hr107mc90
+ owner_uuid: zzzzz-tpzed-000000000000000
+ created_at: 2014-04-01 13:53:33 -0400
+ modified_by_client_uuid: zzzzz-ozdt8-brczlopd8u8d0jr
+ modified_by_user_uuid: zzzzz-tpzed-000000000000000
+ modified_at: 2014-04-01 13:53:33 -0400
+ updated_at: 2014-04-01 13:53:33 -0400
+ tail_kind: arvados#user
+ tail_uuid: zzzzz-tpzed-4hvxm4n25emegis
+ link_class: permission
+ name: can_read
+ head_kind: arvados#group
+ head_uuid: zzzzz-j7d0g-48foin4vonvc2at
+ properties: {}
+
+testusergroup_can_manage_active_user:
+ uuid: zzzzz-o0j2j-2vaqhxz6hsf4k1d
+ owner_uuid: zzzzz-tpzed-000000000000000
+ created_at: 2014-04-01 13:56:10 -0400
+ modified_by_client_uuid: zzzzz-ozdt8-brczlopd8u8d0jr
+ modified_by_user_uuid: zzzzz-tpzed-000000000000000
+ modified_at: 2014-04-01 13:56:10 -0400
+ updated_at: 2014-04-01 13:56:10 -0400
+ tail_kind: arvados#group
+ tail_uuid: zzzzz-j7d0g-48foin4vonvc2at
+ link_class: permission
+ name: can_manage
+ head_kind: arvados#user
+ head_uuid: zzzzz-tpzed-xurymjxw79nv3jz
+ properties: {}
--- /dev/null
+owned_by_active_user:
+ uuid: zzzzz-2x53u-3zx463qyo0k4xrn
+ owner_uuid: zzzzz-tpzed-xurymjxw79nv3jz
+
+owned_by_private_group:
+ uuid: zzzzz-2x53u-5m3qwg45g3nlpu6
+ owner_uuid: zzzzz-j7d0g-rew6elm53kancon
+
+owned_by_spectator:
+ uuid: zzzzz-2x53u-3b0xxwzlbzxq5yr
+ owner_uuid: zzzzz-tpzed-l1s2piq4t4mps8r
is_admin: true
prefs: {}
+miniadmin:
+ uuid: zzzzz-tpzed-2bg9x0oeydcw5hm
+ email: miniadmin@arvados.local
+ first_name: TestCase
+ last_name: User Group Administrator
+ identity_url: https://miniadmin.openid.local
+ is_active: true
+ is_admin: false
+ prefs: {}
+
+rominiadmin:
+ uuid: zzzzz-tpzed-4hvxm4n25emegis
+ email: rominiadmin@arvados.local
+ first_name: TestCase
+ last_name: Read-Only User Group Administrator
+ identity_url: https://rominiadmin.openid.local
+ is_active: true
+ is_admin: false
+ prefs: {}
+
active:
uuid: zzzzz-tpzed-xurymjxw79nv3jz
email: active-user@arvados.local
end
end
+ test "items.count == items_available" do
+ authorize_with :active
+ get :index, limit: 100000
+ assert_response :success
+ resp = JSON.parse(@response.body)
+ assert_equal resp['items_available'], assigns(:objects).length
+ assert_equal resp['items_available'], resp['items'].count
+ unique_uuids = resp['items'].collect { |i| i['uuid'] }.compact.uniq
+ assert_equal unique_uuids.count, resp['items'].count
+ end
+
test "get index with limit=2 offset=99999" do
# Assume there are not that many test fixtures.
authorize_with :active
assert_nil resp['1f4b0bc7583c2a7f9102c395f4ffc5e3+45'] # foo
end
+ test "search collections with 'any' operator" do
+ authorize_with :active
+ get :index, {
+ where: { any: ['contains', '7f9102c395f4ffc5e3'] }
+ }
+ assert_response :success
+ found = assigns(:objects).collect(&:uuid)
+ assert_equal 1, found.count
+ assert_equal true, !!found.index('1f4b0bc7583c2a7f9102c395f4ffc5e3+45')
+ end
+
end
}
assert_response :success
found = assigns(:objects).collect(&:uuid)
- assert_equal true, !!found.index('zzzzz-8i9sb-pshmckwoma9plh7')
+ assert_equal 0, found.index('zzzzz-8i9sb-pshmckwoma9plh7')
+ assert_equal 1, found.count
end
test "search jobs by nonexistent column with < query" do
require 'test_helper'
class Arvados::V1::UsersControllerTest < ActionController::TestCase
+ include CurrentApiClient
setup do
@all_links_at_start = Link.all
assert_nil created['identity_url'], 'expected no identity_url'
# arvados#user, repo link and link add user to 'All users' group
- verify_num_links @all_links_at_start, 3
+ verify_num_links @all_links_at_start, 4
verify_link response_items, 'arvados#user', true, 'permission', 'can_login',
created['uuid'], created['email'], 'arvados#user', false, 'User'
verify_link response_items, 'arvados#virtualMachine', false, 'permission', 'can_login',
nil, created['uuid'], 'arvados#virtualMachine', false, 'VirtualMachine'
+ verify_system_group_permission_link_for created['uuid']
+
# invoke setup again with the same data
post :setup, {
repo_name: repo_name,
assert_nil created['identity_url'], 'expected no identity_url'
# arvados#user, repo link and link add user to 'All users' group
- verify_num_links @all_links_at_start, 4
+ verify_num_links @all_links_at_start, 5
verify_link response_items, 'arvados#repository', true, 'permission', 'can_write',
repo_name, created['uuid'], 'arvados#repository', true, 'Repository'
verify_link response_items, 'arvados#virtualMachine', true, 'permission', 'can_login',
@vm_uuid, created['uuid'], 'arvados#virtualMachine', false, 'VirtualMachine'
+
+ verify_system_group_permission_link_for created['uuid']
end
test "setup user with bogus uuid and expect error" do
assert_not_nil response_object['uuid'], 'expected uuid for the new user'
assert_equal response_object['email'], 'foo@example.com', 'expected given email'
- # three extra links; login link, group link and repo link
- verify_num_links @all_links_at_start, 3
+ # four extra links; system_group, login, group and repo perms
+ verify_num_links @all_links_at_start, 4
end
test "setup user with fake vm and expect error" do
assert_not_nil response_object['uuid'], 'expected uuid for the new user'
assert_equal response_object['email'], 'foo@example.com', 'expected given email'
- # three extra links; login link, group link and repo link
- verify_num_links @all_links_at_start, 4
+ # five extra links; system_group, login, group, vm, repo
+ verify_num_links @all_links_at_start, 5
end
test "setup user with valid email, no vm and repo as input" do
assert_not_nil response_object['uuid'], 'expected uuid for new user'
assert_equal response_object['email'], 'foo@example.com', 'expected given email'
- # two extra links; login link and group link
- verify_num_links @all_links_at_start, 2
+ # three extra links; system_group, login, and group
+ verify_num_links @all_links_at_start, 3
end
test "setup user with email, first name, repo name and vm uuid" do
assert_equal 'test_first_name', response_object['first_name'],
'expecting first name'
- # four extra links; login link, group link, repo link and vm link
- verify_num_links @all_links_at_start, 4
+ # five extra links; system_group, login, group, repo and vm
+ verify_num_links @all_links_at_start, 5
end
test "setup user twice with email and check two different objects created" do
response_object = find_obj_in_resp response_items, 'User', nil
assert_not_nil response_object['uuid'], 'expected uuid for new user'
assert_equal response_object['email'], 'foo@example.com', 'expected given email'
- verify_num_links @all_links_at_start, 3 # openid, group, and repo. no vm
+ # system_group, openid, group, and repo. No vm link.
+ verify_num_links @all_links_at_start, 4
# create again
post :setup, {
'expected same uuid as first create operation'
assert_equal response_object['email'], 'foo@example.com', 'expected given email'
- # extra login link only
- verify_num_links @all_links_at_start, 4
+ # +1 extra login link +1 extra system_group link pointing to the new User
+ verify_num_links @all_links_at_start, 6
end
test "setup user with openid prefix" do
assert_nil created['identity_url'], 'expected no identity_url'
# verify links
- # 3 new links: arvados#user, repo, and 'All users' group.
- verify_num_links @all_links_at_start, 3
+ # four new links: system_group, arvados#user, repo, and 'All users' group.
+ verify_num_links @all_links_at_start, 4
verify_link response_items, 'arvados#user', true, 'permission', 'can_login',
created['uuid'], created['email'], 'arvados#user', false, 'User'
assert_not_nil created['email'], 'expected non-nil email'
assert_nil created['identity_url'], 'expected no identity_url'
- # expect 4 new links: arvados#user, repo, vm and 'All users' group link
- verify_num_links @all_links_at_start, 4
+ # five new links: system_group, arvados#user, repo, vm and 'All
+ # users' group link
+ verify_num_links @all_links_at_start, 5
verify_link response_items, 'arvados#user', true, 'permission', 'can_login',
created['uuid'], created['email'], 'arvados#user', false, 'User'
assert_not_nil created['email'], 'expected non-nil email'
assert_equal created['email'], 'foo@example.com', 'expected input email'
- # verify links; 2 new links: arvados#user, and 'All users' group.
- verify_num_links @all_links_at_start, 2
+ # three new links: system_group, arvados#user, and 'All users' group.
+ verify_num_links @all_links_at_start, 3
verify_link response_items, 'arvados#user', true, 'permission', 'can_login',
created['uuid'], created['email'], 'arvados#user', false, 'User'
assert_not_nil created['uuid'], 'expected uuid for the new user'
assert_equal created['email'], 'foo@example.com', 'expected given email'
- # 4 extra links: login, group, repo and vm
- verify_num_links @all_links_at_start, 4
+ # five extra links: system_group, login, group, repo and vm
+ verify_num_links @all_links_at_start, 5
verify_link response_items, 'arvados#user', true, 'permission', 'can_login',
created['uuid'], created['email'], 'arvados#user', false, 'User'
def verify_num_links (original_links, expected_additional_links)
links_now = Link.all
- assert_equal original_links.size+expected_additional_links, Link.all.size,
+ assert_equal expected_additional_links, Link.all.size-original_links.size,
"Expected #{expected_additional_links.inspect} more links"
end
tail_uuid: uuid)
if expect_signatures
- assert signed_uuids.any?, "expected singnatures"
+ assert signed_uuids.any?, "expected signatures"
else
- assert !signed_uuids.any?, "expected all singnatures deleted"
+ assert !signed_uuids.any?, "expected all signatures deleted"
end
end
+
+ def verify_system_group_permission_link_for user_uuid
+ assert_equal 1, Link.where(link_class: 'permission',
+ name: 'can_manage',
+ tail_uuid: system_group_uuid,
+ head_uuid: user_uuid).count
+ end
end
class PermissionsTest < ActionDispatch::IntegrationTest
fixtures :users, :groups, :api_client_authorizations, :collections
- test "adding and removing direct can_read links" do
- auth = {'HTTP_AUTHORIZATION' => "OAuth2 #{api_client_authorizations(:spectator).api_token}"}
- admin_auth = {'HTTP_AUTHORIZATION' => "OAuth2 #{api_client_authorizations(:admin).api_token}"}
+ def auth auth_fixture
+ {'HTTP_AUTHORIZATION' => "OAuth2 #{api_client_authorizations(auth_fixture).api_token}"}
+ end
+ test "adding and removing direct can_read links" do
# try to read collection as spectator
- get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth
+ get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth(:spectator)
assert_response 404
# try to add permission as spectator
head_uuid: collections(:foo_file).uuid,
properties: {}
}
- }, auth
+ }, auth(:spectator)
assert_response 422
# add permission as admin
head_uuid: collections(:foo_file).uuid,
properties: {}
}
- }, admin_auth
+ }, auth(:admin)
u = jresponse['uuid']
assert_response :success
# read collection as spectator
- get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth
+ get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth(:spectator)
assert_response :success
# try to delete permission as spectator
- delete "/arvados/v1/links/#{u}", {:format => :json}, auth
+ delete "/arvados/v1/links/#{u}", {:format => :json}, auth(:spectator)
assert_response 403
# delete permission as admin
- delete "/arvados/v1/links/#{u}", {:format => :json}, admin_auth
+ delete "/arvados/v1/links/#{u}", {:format => :json}, auth(:admin)
assert_response :success
# try to read collection as spectator
- get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth
+ get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth(:spectator)
assert_response 404
end
test "adding can_read links from user to group, group to collection" do
- auth = {'HTTP_AUTHORIZATION' => "OAuth2 #{api_client_authorizations(:spectator).api_token}"}
- admin_auth = {'HTTP_AUTHORIZATION' => "OAuth2 #{api_client_authorizations(:admin).api_token}"}
-
# try to read collection as spectator
- get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth
+ get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth(:spectator)
assert_response 404
# add permission for spectator to read group
head_uuid: groups(:private).uuid,
properties: {}
}
- }, admin_auth
+ }, auth(:admin)
assert_response :success
# try to read collection as spectator
- get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth
+ get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth(:spectator)
assert_response 404
# add permission for group to read collection
head_uuid: collections(:foo_file).uuid,
properties: {}
}
- }, admin_auth
+ }, auth(:admin)
u = jresponse['uuid']
assert_response :success
# try to read collection as spectator
- get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth
+ get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth(:spectator)
assert_response :success
# delete permission for group to read collection
- delete "/arvados/v1/links/#{u}", {:format => :json}, admin_auth
+ delete "/arvados/v1/links/#{u}", {:format => :json}, auth(:admin)
assert_response :success
# try to read collection as spectator
- get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth
+ get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth(:spectator)
assert_response 404
end
test "adding can_read links from group to collection, user to group" do
- auth = {'HTTP_AUTHORIZATION' => "OAuth2 #{api_client_authorizations(:spectator).api_token}"}
- admin_auth = {'HTTP_AUTHORIZATION' => "OAuth2 #{api_client_authorizations(:admin).api_token}"}
-
# try to read collection as spectator
- get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth
+ get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth(:spectator)
assert_response 404
# add permission for group to read collection
head_uuid: collections(:foo_file).uuid,
properties: {}
}
- }, admin_auth
+ }, auth(:admin)
assert_response :success
# try to read collection as spectator
- get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth
+ get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth(:spectator)
assert_response 404
# add permission for spectator to read group
head_uuid: groups(:private).uuid,
properties: {}
}
- }, admin_auth
+ }, auth(:admin)
u = jresponse['uuid']
assert_response :success
# try to read collection as spectator
- get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth
+ get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth(:spectator)
assert_response :success
# delete permission for spectator to read group
- delete "/arvados/v1/links/#{u}", {:format => :json}, admin_auth
+ delete "/arvados/v1/links/#{u}", {:format => :json}, auth(:admin)
assert_response :success
# try to read collection as spectator
- get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth
+ get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth(:spectator)
assert_response 404
end
test "adding can_read links from user to group, group to group, group to collection" do
- auth = {'HTTP_AUTHORIZATION' => "OAuth2 #{api_client_authorizations(:spectator).api_token}"}
- admin_auth = {'HTTP_AUTHORIZATION' => "OAuth2 #{api_client_authorizations(:admin).api_token}"}
-
# try to read collection as spectator
- get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth
+ get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth(:spectator)
assert_response 404
# add permission for user to read group
head_uuid: groups(:private).uuid,
properties: {}
}
- }, admin_auth
+ }, auth(:admin)
assert_response :success
# add permission for group to read group
head_uuid: groups(:empty_lonely_group).uuid,
properties: {}
}
- }, admin_auth
+ }, auth(:admin)
assert_response :success
# add permission for group to read collection
head_uuid: collections(:foo_file).uuid,
properties: {}
}
- }, admin_auth
+ }, auth(:admin)
u = jresponse['uuid']
assert_response :success
# try to read collection as spectator
- get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth
+ get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth(:spectator)
assert_response :success
# delete permission for group to read collection
- delete "/arvados/v1/links/#{u}", {:format => :json}, admin_auth
+ delete "/arvados/v1/links/#{u}", {:format => :json}, auth(:admin)
assert_response :success
# try to read collection as spectator
- get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth
+ get "/arvados/v1/collections/#{collections(:foo_file).uuid}", {:format => :json}, auth(:spectator)
+ assert_response 404
+ end
+
+ test "read-only group-admin sees correct subset of user list" do
+ get "/arvados/v1/users", {:format => :json}, auth(:rominiadmin)
+ assert_response :success
+ resp_uuids = jresponse['items'].collect { |i| i['uuid'] }
+ [[true, users(:rominiadmin).uuid],
+ [true, users(:active).uuid],
+ [false, users(:miniadmin).uuid],
+ [false, users(:spectator).uuid]].each do |should_find, uuid|
+ assert_equal should_find, !resp_uuids.index(uuid).nil?, "rominiadmin should #{'not ' if !should_find}see #{uuid} in user list"
+ end
+ end
+
+ test "read-only group-admin cannot modify administered user" do
+ put "/arvados/v1/users/#{users(:active).uuid}", {
+ :user => {
+ first_name: 'KilroyWasHere'
+ },
+ :format => :json
+ }, auth(:rominiadmin)
+ assert_response 403
+ end
+
+ test "read-only group-admin cannot read or update non-administered user" do
+ get "/arvados/v1/users/#{users(:spectator).uuid}", {
+ :format => :json
+ }, auth(:rominiadmin)
+ assert_response 404
+
+ put "/arvados/v1/users/#{users(:spectator).uuid}", {
+ :user => {
+ first_name: 'KilroyWasHere'
+ },
+ :format => :json
+ }, auth(:rominiadmin)
assert_response 404
end
+
+ test "RO group-admin finds user's specimens, RW group-admin can update" do
+ [[:rominiadmin, false],
+ [:miniadmin, true]].each do |which_user, update_should_succeed|
+ get "/arvados/v1/specimens", {:format => :json}, auth(which_user)
+ assert_response :success
+ resp_uuids = jresponse['items'].collect { |i| i['uuid'] }
+ [[true, specimens(:owned_by_active_user).uuid],
+ [true, specimens(:owned_by_private_group).uuid],
+ [false, specimens(:owned_by_spectator).uuid],
+ ].each do |should_find, uuid|
+ assert_equal(should_find, !resp_uuids.index(uuid).nil?,
+ "%s should%s see %s in specimen list" %
+ [which_user.to_s,
+ should_find ? '' : 'not ',
+ uuid])
+ put "/arvados/v1/specimens/#{uuid}", {
+ :specimen => {
+ properties: {
+ miniadmin_was_here: true
+ }
+ },
+ :format => :json
+ }, auth(which_user)
+ if !should_find
+ assert_response 404
+ elsif !update_should_succeed
+ assert_response 403
+ else
+ assert_response :success
+ end
+ end
+ end
+ end
+
end