--- layout: default navsection: userguide title: "Tutorial 1: Using the arv command" navorder: 20 --- h1. Tutorial 1: Using the "arv" command This tutorial introduces the @arv@ command line tool and demonstrates how to run a simple script. This tutorial assumes that you are logged into an Arvados VM instance, as described in "accessing arvados over ssh":#login h2. Checking your environment Check that you are able to access the Arvados API server using the following command:
$ arv user current
If you receive the message "ARVADOS_API_HOST and ARVADOS_API_TOKEN need to be defined as environment variables", follow the instructions "Getting an API token":api-tokens.html , then return to this document. If @arv user current@ is able to access the API server, it will print out the unique identifier associated with your account, for example (you will receive a different identifier than shown in this example):
qr1hi-xioed-9z2p3pn12yqdaem
This unique identifier represents your identity in the Arvados system and is similar to the concept of a pointer or a foreign key. You may de-reference any identifier returned by the "arv" command using the @-h@ command line option. For example:
$ arv -h user current
{
 "href":"https://qr1hi.arvadosapi.com/arvados/v1/users/qr1hi-xioed-9z2p3pn12yqdaem",
 "kind":"arvados#user",
 "etag":"8u0xwb9f3otb2xx9hto4wyo03",
 "uuid":"qr1hi-tpzed-92d3kxnimy3d4e8",
 "owner_uuid":"qr1hi-tpqed-23iddeohxta2r59",
 "created_at":"2013-12-02T17:05:47Z",
 "modified_by_client_uuid":"qr1hi-xxfg8-owxa2oa2s33jyej",
 "modified_by_user_uuid":"qr1hi-tpqed-23iddeohxta2r59",
 "modified_at":"2013-12-02T17:07:08Z",
 "updated_at":"2013-12-05T19:51:08Z",
 "email":"you@example.com",
 "full_name":"Example User",
 "first_name":"Example",
 "last_name":"User",
 "identity_url":"https://www.google.com/accounts/o8/id?id=AItOawnhlZr-pQ_Ic2f2W22XaO02oL3avJ322k1",
 "is_active": true,
 "is_admin": false,
 "prefs":{}
}
h2. Submitting your first job We will run the "hash" program, which computes the MD5 hash of each file in a collection. Pick a data collection. We'll use @33a9f3842b01ea3fdf27cc582f5ea2af@ here. _How do I know if I have this data? Does it come as example data with the arvados distribution? Is there something notable about it, like it is very large and spans multiple keep blocks?_
the_collection=33a9f3842b01ea3fdf27cc582f5ea2af
Pick a code version. We'll use @5565778cf15ae9af22ad392053430213e9016631@ here. _How do I know if I have this code version? What does this refer to? A git revision? Or a keep id? In what repository?_
the_version=5565778cf15ae9af22ad392053430213e9016631
Make a JSON object describing the job.
read -rd $'\000' the_job <

_Need to explain what the json fields mean, it is explained later but
there should be some mention up here._

(The @read -rd $'\000'@ part uses a bash feature to help us get a
multi-line string with lots of double quotation marks into a shell
variable.)

Submit the job.

arv -h job create --job "$the_job"
{
 "kind":"arvados#job",
 "etag":"dwbrasqcozpjsqtfshzdjfiii",
 "uuid":"qr1hi-8i9sb-3i0yi357k0mauwz",
...
 "script":"hash",
 "script_parameters":{
  "input":"33a9f3842b01ea3fdf27cc582f5ea2af"
 },
 "script_version":"5565778cf15ae9af22ad392053430213e9016631",
...
}
_What is this? An example of what "arv" returns? What do the fields mean?_ h3. Monitor job progress _And then the magic happens. There should be some more discussion of what is going on in the background once the job is submitted from the user's perspective. It is queued, running, etc?._ Go to Workbench, drop down the Compute menu, and click Jobs. The job you submitted should appear at the top of the list. Hit "Refresh" until it finishes. _We should really make the page autorefresh or use a streamed-update framework_ You can also watch the log messages while the job runs:
curl -s -H "Authorization: OAuth2 $ARVADOS_API_TOKEN" \
  "https://$ARVADOS_API_HOST/arvados/v1/jobs/JOB_UUID_HERE/log_tail_follow"
This will run until the job finishes or you hit control-C. If you're running more than one job today, you can watch log messages from all of them in one stream:
my_user_uuid=`arv user current`
curl -s -H "Authorization: OAuth2 $ARVADOS_API_TOKEN" \
  "https://$ARVADOS_API_HOST/arvados/v1/users/$my_user_uuid/event_stream"
This will run until you hit control-C. h3. Inspect the job output Find the output of the job by looking at the Jobs page (in the Compute menu) in Workbench, or by using the API:
arv -h job get --uuid JOB_UUID_HERE
The output locator will look like 5894dfae5d6d8edf135f0ea3dba849c2+62+K@qr1hi. List the files in the collection:
arv keep ls 5894dfae5d6d8edf135f0ea3dba849c2+62+K@qr1hi
md5sum.txt
Show the contents of the md5sum.txt file:
arv keep less 5894dfae5d6d8edf135f0ea3dba849c2+62+K@qr1hi/md5sum.txt
h3. Inspect the code The @script@ and @script_version@ attributes of a Job allow you to confirm the code that was used to run the job. Specifically, @script@ refers to a file in the @/crunch_scripts@ directory in the tree indicated by the commit hash @script_version@. Example:
cd
git clone git://github.com/clinicalfuture/arvados.git
cd arvados
git checkout $the_version
less crunch_scripts/hash
_If we're going to direct the user to open up the code, some discussion of the python API is probably in order. If the hash job is going to be the canonical first crunch map reduce program for everybody, than we should break down the program line-by-line and explain every step in detail._