--- layout: default navsection: userguide title: "Tutorial: Your first job" navorder: 20 --- h1. Tutorial: Your first job Here you will use the "arv" command line tool to run a simple Crunch script on some sample data. h3. Prerequisites _Needs a mention of going to Access->VMs on the workbench_ * Log in to a VM "using SSH":ssh-access.html * Put an "API token":api-tokens.html in your @ARVADOS_API_TOKEN@ environment variable * Put the API host name in your @ARVADOS_API_HOST@ environment variable If everything is set up correctly, the command @arv -h user current@ will display your account information. _If you are logged in to a fully provisioned VM, presumably the gems are already installed. This discussion should go somewhere else._ Arv depends on a few gems. It will tell you which ones to install, if they are not present yet. If you need to install the dependencies and are doing so as a non-root user, make sure you set GEM_HOME before you run gem install:
    export GEM_HOME=~/.gem
h3. Submit a job We will run the "hash" program, which computes the MD5 hash of each file in a collection. Pick a data collection. We'll use @33a9f3842b01ea3fdf27cc582f5ea2af@ here. _How do I know if I have this data? Does it come as example data with the arvados distribution? Is there something notable about it, like it is very large and spans multiple keep blocks?_
the_collection=33a9f3842b01ea3fdf27cc582f5ea2af
Pick a code version. We'll use @5565778cf15ae9af22ad392053430213e9016631@ here. _How do I know if I have this code version? What does this refer to? A git revision? Or a keep id? In what repository?_
the_version=5565778cf15ae9af22ad392053430213e9016631
Make a JSON object describing the job.
read -rd $'\000' the_job <

_Need to explain what the json fields mean, it is explained later but
there should be some mention up here._

(The @read -rd $'\000'@ part uses a bash feature to help us get a
multi-line string with lots of double quotation marks into a shell
variable.)

Submit the job.

arv -h job create --job "$the_job"
{
 "kind":"arvados#job",
 "etag":"dwbrasqcozpjsqtfshzdjfiii",
 "uuid":"qr1hi-8i9sb-3i0yi357k0mauwz",
...
 "script":"hash",
 "script_parameters":{
  "input":"33a9f3842b01ea3fdf27cc582f5ea2af"
 },
 "script_version":"5565778cf15ae9af22ad392053430213e9016631",
...
}
_What is this? An example of what "arv" returns? What do the fields mean?_ h3. Monitor job progress _And then the magic happens. There should be some more discussion of what is going on in the background once the job is submitted from the user's perspective. It is queued, running, etc?._ Go to Workbench, drop down the Compute menu, and click Jobs. The job you submitted should appear at the top of the list. Hit "Refresh" until it finishes. _We should really make the page autorefresh or use a streamed-update framework_ You can also watch the log messages while the job runs:
curl -s -H "Authorization: OAuth2 $ARVADOS_API_TOKEN" \
  "https://$ARVADOS_API_HOST/arvados/v1/jobs/JOB_UUID_HERE/log_tail_follow"
This will run until the job finishes or you hit control-C. If you're running more than one job today, you can watch log messages from all of them in one stream:
my_user_uuid=`arv user current`
curl -s -H "Authorization: OAuth2 $ARVADOS_API_TOKEN" \
  "https://$ARVADOS_API_HOST/arvados/v1/users/$my_user_uuid/event_stream"
This will run until you hit control-C. h3. Inspect the job output Find the output of the job by looking at the Jobs page (in the Compute menu) in Workbench, or by using the API:
arv -h job get --uuid JOB_UUID_HERE
The output locator will look like 5894dfae5d6d8edf135f0ea3dba849c2+62+K@qr1hi. List the files in the collection:
arv keep ls 5894dfae5d6d8edf135f0ea3dba849c2+62+K@qr1hi
md5sum.txt
Show the contents of the md5sum.txt file:
arv keep less 5894dfae5d6d8edf135f0ea3dba849c2+62+K@qr1hi/md5sum.txt
h3. Inspect the code The @script@ and @script_version@ attributes of a Job allow you to confirm the code that was used to run the job. Specifically, @script@ refers to a file in the @/crunch_scripts@ directory in the tree indicated by the commit hash @script_version@. Example:
cd
git clone git://github.com/clinicalfuture/arvados.git
cd arvados
git checkout $the_version
less crunch_scripts/hash
_If we're going to direct the user to open up the code, some discussion of the python API is probably in order. If the hash job is going to be the canonical first crunch map reduce program for everybody, than we should break down the program line-by-line and explain every step in detail._