4 title: "Tutorial 1: Using the arv command"
8 h1. Tutorial 1: Using the "arv" command
10 This tutorial introduces the @arv@ command line tool and demonstrates how to run a simple script.
12 This tutorial assumes that you are logged into an Arvados VM instance, as described in "accessing arvados over ssh":#login
14 h2. Checking your environment
16 Check that you are able to access the Arvados API server using the following command:
22 If you receive the message "ARVADOS_API_HOST and ARVADOS_API_TOKEN need to be defined as environment variables", follow the instructions "Getting an API token":api-tokens.html , then return to this document.
24 If @arv user current@ is able to access the API server, it will print out the unique identifier associated with your account:
27 qr1hi-xioed-9z2p3pn12yqdaem
30 This unique identifier represents your identity in the Arvados system and is similar to the concept of a pointer or a foreign key. You may de-reference any identifier returned by the "arv" command using the @-h@ command line option. For example:
35 "href":"https://qr1hi.arvadosapi.com/arvados/v1/users/qr1hi-xioed-9z2p3pn12yqdaem",
36 "kind":"arvados#user",
37 "etag":"8u0xwb9f3otb2xx9hto4wyo03",
38 "uuid":"qr1hi-tpzed-92d3kxnimy3d4e8",
39 "owner_uuid":"qr1hi-tpqed-23iddeohxta2r59",
40 "created_at":"2013-12-02T17:05:47Z",
41 "modified_by_client_uuid":"qr1hi-xxfg8-owxa2oa2s33jyej",
42 "modified_by_user_uuid":"qr1hi-tpqed-23iddeohxta2r59",
43 "modified_at":"2013-12-02T17:07:08Z",
44 "updated_at":"2013-12-05T19:51:08Z",
45 "email":"you@example.com",
46 "full_name":"Example User",
47 "first_name":"Example",
49 "identity_url":"https://www.google.com/accounts/o8/id?id=AItOawnhlZr-pQ_Ic2f2W22XaO02oL3avJ322k1",
56 h2. Submitting your first job
58 We will run the "hash" program, which computes the MD5 hash of each file in a collection.
60 Pick a data collection. We'll use @33a9f3842b01ea3fdf27cc582f5ea2af@ here.
62 _How do I know if I have this data? Does it come as example data with
63 the arvados distribution? Is there something notable about it, like
64 it is very large and spans multiple keep blocks?_
67 the_collection=33a9f3842b01ea3fdf27cc582f5ea2af
70 Pick a code version. We'll use @5565778cf15ae9af22ad392053430213e9016631@ here.
72 _How do I know if I have this code version? What does this refer to?
73 A git revision? Or a keep id? In what repository?_
76 the_version=5565778cf15ae9af22ad392053430213e9016631
79 Make a JSON object describing the job.
82 read -rd $'\000' the_job <<EOF
85 "script_version":"$the_version",
88 "input":"$the_collection"
94 _Need to explain what the json fields mean, it is explained later but
95 there should be some mention up here._
97 (The @read -rd $'\000'@ part uses a bash feature to help us get a
98 multi-line string with lots of double quotation marks into a shell
104 arv -h job create --job "$the_job"
111 "kind":"arvados#job",
112 "etag":"dwbrasqcozpjsqtfshzdjfiii",
113 "uuid":"qr1hi-8i9sb-3i0yi357k0mauwz",
116 "script_parameters":{
117 "input":"33a9f3842b01ea3fdf27cc582f5ea2af"
119 "script_version":"5565778cf15ae9af22ad392053430213e9016631",
124 _What is this? An example of what "arv" returns? What do the fields mean?_
126 h3. Monitor job progress
128 _And then the magic happens. There should be some more discussion of what
129 is going on in the background once the job is submitted from the
130 user's perspective. It is queued, running, etc?._
132 Go to Workbench, drop down the Compute menu, and click Jobs. The job you submitted should appear at the top of the list.
134 Hit "Refresh" until it finishes. _We should really make the page
135 autorefresh or use a streamed-update framework_
137 You can also watch the log messages while the job runs:
140 curl -s -H "Authorization: OAuth2 $ARVADOS_API_TOKEN" \
141 "https://$ARVADOS_API_HOST/arvados/v1/jobs/JOB_UUID_HERE/log_tail_follow"
144 This will run until the job finishes or you hit control-C.
146 If you're running more than one job today, you can watch log messages from all of them in one stream:
149 my_user_uuid=`arv user current`
150 curl -s -H "Authorization: OAuth2 $ARVADOS_API_TOKEN" \
151 "https://$ARVADOS_API_HOST/arvados/v1/users/$my_user_uuid/event_stream"
154 This will run until you hit control-C.
156 h3. Inspect the job output
158 Find the output of the job by looking at the Jobs page (in the Compute menu) in Workbench, or by using the API:
161 arv -h job get --uuid JOB_UUID_HERE
164 The output locator will look like <code>5894dfae5d6d8edf135f0ea3dba849c2+62+K@qr1hi</code>.
166 List the files in the collection:
169 arv keep ls 5894dfae5d6d8edf135f0ea3dba849c2+62+K@qr1hi
178 Show the contents of the md5sum.txt file:
181 arv keep less 5894dfae5d6d8edf135f0ea3dba849c2+62+K@qr1hi/md5sum.txt
186 The @script@ and @script_version@ attributes of a Job allow you to confirm the code that was used to run the job. Specifically, @script@ refers to a file in the @/crunch_scripts@ directory in the tree indicated by the commit hash @script_version@.
192 git clone git://github.com/clinicalfuture/arvados.git
194 git checkout $the_version
195 less crunch_scripts/hash
198 _If we're going to direct the user to open up the code, some
199 discussion of the python API is probably in order. If the hash
200 job is going to be the canonical first crunch map reduce program
201 for everybody, than we should break down the program line-by-line and
202 explain every step in detail._