--- layout: default navsection: userguide title: "Tutorial 1: Using the arv command" navorder: 20 --- h1. Tutorial 1: Using the "arv" command This tutorial introduces the @arv@ command line tool and demonstrates how to run a simple script. This tutorial assumes that you are logged into an Arvados VM instance, as described in "accessing arvados over ssh":#login h2. Checking your environment Check that you are able to access the Arvados API server using the following command:
$ arv user currentIf you receive the message "ARVADOS_API_HOST and ARVADOS_API_TOKEN need to be defined as environment variables", follow the instructions "Getting an API token":api-tokens.html , then return to this document. If @arv user current@ is able to access the API server, it will print out the unique identifier associated with your account, for example (you will receive a different identifier than shown in this example):
qr1hi-xioed-9z2p3pn12yqdaemThis unique identifier represents your identity in the Arvados system and is similar to the concept of a pointer or a foreign key. You may de-reference any identifier returned by the "arv" command using the @-h@ command line option. For example:
$ arv -h user current { "href":"https://qr1hi.arvadosapi.com/arvados/v1/users/qr1hi-xioed-9z2p3pn12yqdaem", "kind":"arvados#user", "etag":"8u0xwb9f3otb2xx9hto4wyo03", "uuid":"qr1hi-tpzed-92d3kxnimy3d4e8", "owner_uuid":"qr1hi-tpqed-23iddeohxta2r59", "created_at":"2013-12-02T17:05:47Z", "modified_by_client_uuid":"qr1hi-xxfg8-owxa2oa2s33jyej", "modified_by_user_uuid":"qr1hi-tpqed-23iddeohxta2r59", "modified_at":"2013-12-02T17:07:08Z", "updated_at":"2013-12-05T19:51:08Z", "email":"you@example.com", "full_name":"Example User", "first_name":"Example", "last_name":"User", "identity_url":"https://www.google.com/accounts/o8/id?id=AItOawnhlZr-pQ_Ic2f2W22XaO02oL3avJ322k1", "is_active": true, "is_admin": false, "prefs":{} }h2. Submitting your first job We will run the "hash" program, which computes the MD5 hash of each file in a collection. Pick a data collection. We'll use @33a9f3842b01ea3fdf27cc582f5ea2af@ here. _How do I know if I have this data? Does it come as example data with the arvados distribution? Is there something notable about it, like it is very large and spans multiple keep blocks?_
the_collection=33a9f3842b01ea3fdf27cc582f5ea2afPick a code version. We'll use @5565778cf15ae9af22ad392053430213e9016631@ here. _How do I know if I have this code version? What does this refer to? A git revision? Or a keep id? In what repository?_
the_version=5565778cf15ae9af22ad392053430213e9016631Make a JSON object describing the job.
read -rd $'\000' the_job <_Need to explain what the json fields mean, it is explained later but there should be some mention up here._ (The @read -rd $'\000'@ part uses a bash feature to help us get a multi-line string with lots of double quotation marks into a shell variable.) Submit the job. arv -h job create --job "$the_job"↓{ "kind":"arvados#job", "etag":"dwbrasqcozpjsqtfshzdjfiii", "uuid":"qr1hi-8i9sb-3i0yi357k0mauwz", ... "script":"hash", "script_parameters":{ "input":"33a9f3842b01ea3fdf27cc582f5ea2af" }, "script_version":"5565778cf15ae9af22ad392053430213e9016631", ... }_What is this? An example of what "arv" returns? What do the fields mean?_ h3. Monitor job progress _And then the magic happens. There should be some more discussion of what is going on in the background once the job is submitted from the user's perspective. It is queued, running, etc?._ Go to Workbench, drop down the Compute menu, and click Jobs. The job you submitted should appear at the top of the list. Hit "Refresh" until it finishes. _We should really make the page autorefresh or use a streamed-update framework_ You can also watch the log messages while the job runs:curl -s -H "Authorization: OAuth2 $ARVADOS_API_TOKEN" \ "https://$ARVADOS_API_HOST/arvados/v1/jobs/JOB_UUID_HERE/log_tail_follow"This will run until the job finishes or you hit control-C. If you're running more than one job today, you can watch log messages from all of them in one stream:my_user_uuid=`arv user current` curl -s -H "Authorization: OAuth2 $ARVADOS_API_TOKEN" \ "https://$ARVADOS_API_HOST/arvados/v1/users/$my_user_uuid/event_stream"This will run until you hit control-C. h3. Inspect the job output Find the output of the job by looking at the Jobs page (in the Compute menu) in Workbench, or by using the API:arv -h job get --uuid JOB_UUID_HEREThe output locator will look like5894dfae5d6d8edf135f0ea3dba849c2+62+K@qr1hi
. List the files in the collection:arv keep ls 5894dfae5d6d8edf135f0ea3dba849c2+62+K@qr1hi↓md5sum.txtShow the contents of the md5sum.txt file:arv keep less 5894dfae5d6d8edf135f0ea3dba849c2+62+K@qr1hi/md5sum.txth3. Inspect the code The @script@ and @script_version@ attributes of a Job allow you to confirm the code that was used to run the job. Specifically, @script@ refers to a file in the @/crunch_scripts@ directory in the tree indicated by the commit hash @script_version@. Example:cd git clone git://github.com/clinicalfuture/arvados.git cd arvados git checkout $the_version less crunch_scripts/hash_If we're going to direct the user to open up the code, some discussion of the python API is probably in order. If the hash job is going to be the canonical first crunch map reduce program for everybody, than we should break down the program line-by-line and explain every step in detail._