4 title: "Tutorial: Your first job"
8 h1. Tutorial: Your first job
10 Here you will use the "arv" command line tool to run a simple Crunch script on some sample data.
14 _Needs a mention of going to Access->VMs on the workbench_
16 * Log in to a VM "using SSH":ssh-access.html
17 * Put an "API token":api-tokens.html in your @ARVADOS_API_TOKEN@ environment variable
18 * Put the API host name in your @ARVADOS_API_HOST@ environment variable
20 If everything is set up correctly, the command @arv -h user current@ will display your account information.
23 _If you are logged in to a fully provisioned VM, presumably the gems
24 are already installed. This discussion should go somewhere else._
26 Arv depends on a few gems. It will tell you which ones to install, if they are not present yet. If you need to install the dependencies and are doing so as a non-root user, make sure you set GEM_HOME before you run gem install:
29 export GEM_HOME=~/.gem
34 We will run the "hash" program, which computes the MD5 hash of each file in a collection.
36 Pick a data collection. We'll use @33a9f3842b01ea3fdf27cc582f5ea2af@ here.
38 _How do I know if I have this data? Does it come as example data with
39 the arvados distribution? Is there something notable about it, like
40 it is very large and spans multiple keep blocks?_
43 the_collection=33a9f3842b01ea3fdf27cc582f5ea2af
46 Pick a code version. We'll use @5565778cf15ae9af22ad392053430213e9016631@ here.
48 _How do I know if I have this code version? What does this refer to?
49 A git revision? Or a keep id? In what repository?_
52 the_version=5565778cf15ae9af22ad392053430213e9016631
55 Make a JSON object describing the job.
58 read -rd $'\000' the_job <<EOF
61 "script_version":"$the_version",
64 "input":"$the_collection"
70 _Need to explain what the json fields mean, it is explained later but
71 there should be some mention up here._
73 (The @read -rd $'\000'@ part uses a bash feature to help us get a
74 multi-line string with lots of double quotation marks into a shell
80 arv -h job create --job "$the_job"
88 "etag":"dwbrasqcozpjsqtfshzdjfiii",
89 "uuid":"qr1hi-8i9sb-3i0yi357k0mauwz",
93 "input":"33a9f3842b01ea3fdf27cc582f5ea2af"
95 "script_version":"5565778cf15ae9af22ad392053430213e9016631",
100 _What is this? An example of what "arv" returns? What do the fields mean?_
102 h3. Monitor job progress
104 _And then the magic happens. There should be some more discussion of what
105 is going on in the background once the job is submitted from the
106 user's perspective. It is queued, running, etc?._
108 Go to Workbench, drop down the Compute menu, and click Jobs. The job you submitted should appear at the top of the list.
110 Hit "Refresh" until it finishes. _We should really make the page
111 autorefresh or use a streamed-update framework_
113 You can also watch the log messages while the job runs:
116 curl -s -H "Authorization: OAuth2 $ARVADOS_API_TOKEN" \
117 "https://$ARVADOS_API_HOST/arvados/v1/jobs/JOB_UUID_HERE/log_tail_follow"
120 This will run until the job finishes or you hit control-C.
122 If you're running more than one job today, you can watch log messages from all of them in one stream:
125 my_user_uuid=`arv user current`
126 curl -s -H "Authorization: OAuth2 $ARVADOS_API_TOKEN" \
127 "https://$ARVADOS_API_HOST/arvados/v1/users/$my_user_uuid/event_stream"
130 This will run until you hit control-C.
132 h3. Inspect the job output
134 Find the output of the job by looking at the Jobs page (in the Compute menu) in Workbench, or by using the API:
137 arv -h job get --uuid JOB_UUID_HERE
140 The output locator will look like <code>5894dfae5d6d8edf135f0ea3dba849c2+62+K@qr1hi</code>.
142 List the files in the collection:
145 arv keep ls 5894dfae5d6d8edf135f0ea3dba849c2+62+K@qr1hi
154 Show the contents of the md5sum.txt file:
157 arv keep less 5894dfae5d6d8edf135f0ea3dba849c2+62+K@qr1hi/md5sum.txt
162 The @script@ and @script_version@ attributes of a Job allow you to confirm the code that was used to run the job. Specifically, @script@ refers to a file in the @/crunch_scripts@ directory in the tree indicated by the commit hash @script_version@.
168 git clone git://github.com/clinicalfuture/arvados.git
170 git checkout $the_version
171 less crunch_scripts/hash
174 _If we're going to direct the user to open up the code, some
175 discussion of the python API is probably in order. If the hash
176 job is going to be the canonical first crunch map reduce program
177 for everybody, than we should break down the program line-by-line and
178 explain every step in detail._