--- layout: default navsection: userguide navmenu: Reference title: "Crunch utility libraries" navorder: 31 --- h1. Crunch utility libraries Several utility libraries are included with Arvados. They are intended to make it quicker and easier to write your own crunch scripts. h4. Python SDK extras The Python SDK adds some convenience features that are particularly useful in crunch scripts, in addition to the standard set of API calls.
import arvados my_user = arvados.api().users().current().execute() my_uuid = my_user['uuid']h4. Get the current job and task parameters @arvados.current_job()@ and @arvados.current_task()@ are convenient ways to retrieve the current Job and Task, using the @JOB_UUID@ and @TASK_UUID@ environment variables provided to each crunch task process.
this_job = arvados.current_job() this_task = arvados.current_task() this_job_input = this_job['script_parameters']['input'] this_task_input = this_task['parameters']['input']h4(#one_task_per_input). Queue a task for each input file A common pattern for a crunch job is to run one task to scan the input, and one task per input file to do the work. The @one_task_per_input_file()@ function implements this pattern. Pseudocode:
if this is the job's first (default) task: for each file in the 'input' collection: queue a new task, with parameters['input'] = file exit else: returnUsage:
import arvados arvados.job_setup.one_task_per_input_file(if_sequence=0, and_end_task=True) # Now do the work on a single file my_input = this_task['parameters']['input']h4. Set the current task's output and success flag Each task in a crunch job must make an API call to record its output and set its @success@ attribute to True. The object returned by @current_task()@ has a @set_output()@ method to make the process more succinct.
arvados.current_task().set_output(my_output_locator)
from arvados_ipc import * children = {} pipes = {} pipe_setup(pipes, 'hellopipe') if 0 == named_fork(children, 'child_a'): pipe_closeallbut(pipes, ('hellopipe', 'w')) os.write(pipes['hellopipe', 'w'], "Hello, parent.") os._exit(0) pipe_closeallbut(pipes, ('hellopipe', 'r')) with os.fdopen(pipes['hellopipe', 'r'], 'rb') as f: message = f.read() sys.stderr.write("Child says: " + message + "\n") if not waitpid_and_check_children(children): raise Exception("Child process exited non-zero.")The "crunch scripts" included with Arvados include some more examples of using the arvados_ipc module.
import arvados_bwa arvados_bwa.run('aln', [ref_basename, '-'], stdin=open(fastq_filename,'rb'), stdout=open(aln_filename,'wb'))On qr1hi.arvadosapi.com, the source distribution @bwa-0.7.5a.tar.bz2@ is available in the collection @8b6e2c4916133e1d859c9e812861ce13+70@.
{ "script_parameters":{ "bwa_tbz":"8b6e2c4916133e1d859c9e812861ce13+70", ... }, ... }
arvados_gatk2.run( args=[ '-nct', 8, '-T', 'BaseRecalibrator', '-R', ref_fasta_files[0], '-I', input_bam_files[0], '-o', recal_file, ])On qr1hi.arvadosapi.com, the binary distribution @GenomeAnalysisTK-2.6-4.tar.bz2@ is available in the collection @5790482512cf6d5d6dfd50b7fd61e1d1+86@. The GATK data bundle is available in the collection @d237a90bae3870b3b033aea1e99de4a9+10820@.
{ "script_parameters":{ "gatk_tbz":"7e0a277d6d2353678a11f56bab3b13f2+87", "gatk_bundle":"d237a90bae3870b3b033aea1e99de4a9+10820", ... }, ... }
import arvados_samtools arvados_samtools.run('view', ['-S', '-b', '-'], stdin=open(sam_filename,'rb'), stdout=open(bam_filename,'wb'))On qr1hi.arvadosapi.com, the source distribution @samtools-0.1.19.tar.gz@ is available in the collection @c777e23cf13e5d5906abfdc08d84bfdb+74@.
{ "script_parameters":{ "samtools_tgz":"c777e23cf13e5d5906abfdc08d84bfdb+74", ... }, ... }
import arvados_picard arvados_picard.run( 'FixMateInformation', params={ 'i': input_bam_path, 'o': '/dev/stdout', 'quiet': 'true', 'so': 'coordinate', 'validation_stringency': 'LENIENT', 'compression_level': 0 }, stdout=open('out.bam','wb'))On qr1hi.arvadosapi.com, the binary distribution @picard-tools-1.82.zip@ is available in the collection @687f74675c6a0e925dec619cc2bec25f+77@.
{ "script_parameters":{ "picard_zip":"687f74675c6a0e925dec619cc2bec25f+77", ... }, ... }