---
layout: default
navsection: userguide
title: "Tutorial: Running a Crunch job"
navorder: 12
---
h1. Tutorial: Running a crunch job
This tutorial introduces the concepts and use of the Arvados Keep storage and Crunch job system using the @arv@ command line tool and Arvados Workbench.
*This tutorial assumes that you are "logged into an Arvados VM instance":ssh-access.html#login, and have a "working environment.":check-environment.html*
In the previous section, we downloaded a file from Keep and computed the md5 hash of the complete file. While straightforward, there are several obvious drawbacks to this approach:
* Large files require significant time to download.
* Very large files may exceed the scratch space of the local disk.
* We are only able to use the local CPU to process the file.
The Arvados "crunch" framework is designed to support processing very large data batches (gigabytes to terabytes) efficiently, and provides the following benefits:
* Increase concurrency by running tasks asynchronously, using many CPUs and network interfaces at once (especially beneficial for CPU-bound and I/O-bound tasks respectively).
* Track inputs, outputs, and settings so you can verify that the inputs, settings, and sequence of programs you used to arrive at an output is really what you think it was.
* Ensure that your programs and workflows are repeatable with different versions of your code, OS updates, etc.
* Interrupt and resume long-running jobs consisting of many short tasks.
* Maintain timing statistics automatically, so they're there when you want them.
For your first job, you will run the "hash" crunch script using the Arvados system. The "hash" script computes the md5 hash of each file in a collection.
Crunch jobs are described using JSON objects. For example:
$ read -d $'\000' the_job <<EOF
{
"script": "hash",
"script_version": "arvados:master",
"script_parameters":
{
"input": "33a9f3842b01ea3fdf27cc582f5ea2af"
}
}
EOF
$ arv -h job create --job "$the_job"
{
"href":"https://qr1hi.arvadosapi.com/arvados/v1/jobs/qr1hi-8i9sb-j5dr6107mxzp3no",
"kind":"arvados#job",
"etag":"aulvmdxezwxo4zrw15gz1v7x3",
"uuid":"qr1hi-8i9sb-j5dr6107mxzp3no",
"owner_uuid":"qr1hi-tpzed-9zdpkpni2yddge6",
"created_at":"2013-12-10T17:07:08Z",
"modified_by_client_uuid":"qr1hi-ozdt8-obw7foaks3qjyej",
"modified_by_user_uuid":"qr1hi-tpzed-9zdpkpni2yddge6",
"modified_at":"2013-12-10T17:07:08Z",
"updated_at":"2013-12-10T17:07:08Z",
"submit_id":null,
"priority":null,
"script":"hash",
"script_parameters":{
"input":"33a9f3842b01ea3fdf27cc582f5ea2af"
},
"script_version":"d3b10812b443dcf0189c1c432483bf7ac06507fe",
"cancelled_at":null,
"cancelled_by_client_uuid":null,
"cancelled_by_user_uuid":null,
"started_at":null,
"finished_at":null,
"output":null,
"success":null,
"running":null,
"is_locked_by_uuid":null,
"log":null,
"runtime_constraints":{},
"tasks_summary":{},
"dependencies":[
"33a9f3842b01ea3fdf27cc582f5ea2af"
],
"log_stream_href":"https://qr1hi.arvadosapi.com/arvados/v1/jobs/qr1hi-8i9sb-j5dr6107mxzp3no/log_tail_follow"
}
$ curl -s -H "Authorization: OAuth2 $ARVADOS_API_TOKEN" _value_of_log_stream_href_from_arv_job_create_
* @-s@ suppress status messages from @curl@ itself
* @-H@ addes a required HTTP header with your Arvados API token
This will run until the job finishes or is @curl@ is canceled with control-C.
h3. Inspect the job output
You can access the job output under the *output* column of the _Compute %(rarr)→% Jobs_ page. Alternately, you can use @arv job get@ to access a JSON object describing the output:
$ arv -h job get --uuid _value_of_uuid_from_arv_job_create_
{
"href":"https://qr1hi.arvadosapi.com/arvados/v1/jobs/qr1hi-8i9sb-zs6d9pxkr0vk175",
"kind":"arvados#job",
"etag":"eoe99lw7rnqxo7j29fh53hz",
"uuid":"qr1hi-8i9sb-zs6d9pxkr0vk175",
"owner_uuid":"qr1hi-tpzed-9zdpkpni2yddge6",
"created_at":"2013-12-10T17:23:26Z",
"modified_by_client_uuid":null,
"modified_by_user_uuid":"qr1hi-tpzed-9zdpkpni2yddge6",
"modified_at":"2013-12-10T17:23:45Z",
"updated_at":"2013-12-10T17:23:45Z",
"submit_id":null,
"priority":null,
"script":"hash",
"script_parameters":{
"input":"33a9f3842b01ea3fdf27cc582f5ea2af"
},
"script_version":"0a8c7c6fce7a9667ee42c1984a845100f51906a2",
"cancelled_at":null,
"cancelled_by_client_uuid":null,
"cancelled_by_user_uuid":null,
"started_at":"2013-12-10T17:23:29Z",
"finished_at":"2013-12-10T17:23:44Z",
"output":"880b55fb4470b148a447ff38cacdd952+54+K@qr1hi",
"success":true,
"running":false,
"is_locked_by_uuid":"qr1hi-tpzed-9zdpkpni2yddge6",
"log":"f760f3dd3105103e058a043310f7e72b+3028+K@qr1hi",
"runtime_constraints":{},
"tasks_summary":{
"done":2,
"running":0,
"failed":0,
"todo":0
},
"dependencies":[
"33a9f3842b01ea3fdf27cc582f5ea2af"
],
"log_stream_href":null
}
$ arv keep get _value_of_output_from_arv_job_get_
. 78b268d1e03d87f8270bdee9d5d427c5+61 0:61:md5sum.txt
$ arv keep get 880b55fb4470b148a447ff38cacdd952+54+K@qr1hi/md5sum.txt
44b8ae3fde7a8a88d2f7ebd237625b4f var-GS000016015-ASM.tsv.bz2