doc/user/tutorial-keep.textile

   1 ---
   2 layout: default
   3 navsection: userguide
   4 title: "Tutorial: Adding Data to Keep"
   5 navorder: 101
   6 ---
   7
   8 h1. Tutorial: Adding Data to Keep
   9
  10 Now that you've run a Crunch job on sample data, we'll walk you through the process of uploading your own research data into Keep, the distributed storage service.
  11
  12 h2. Prerequisites
  13
  14 You should have already "run your first job":tutorial-job1.html using sample data on an Arvados shell VM.  If you haven't, go do that first.
  15
  16 h2. Adding Data to Keep
  17
  18 The first step is to copy your data to the shell VM where you have command-line access to Arvados tools.
  19
  20 Let's suppose you have a VCF file, @MyData.vcf@ and want to run an Arvados pipeline on this data.  Copy it to the Arvados shell VM with @rsync@:
  21
  22 bc.
  23     rsync MyData.vcf shell.arvados:MyData.vcf
  24
  25 If you don't already have VCF data ready to go, you can download a free VCF exome from https://my.personalgenomes.org/user_file/download/825:
  26
  27 bc..
  28     $ *ssh shell.arvados*
  29
  30     shell.arvados$ *wget -o LF5713.vcf  https://my.personalgenomes.org/user_file/download/825*
  31     --2013-12-10 21:25:18--  https://my.personalgenomes.org/user_file/download/825
  32     Resolving my.personalgenomes.org (my.personalgenomes.org)... 134.174.150.6
  33     Connecting to my.personalgenomes.org (my.personalgenomes.org)|134.174.150.6|:443... connected.
  34     ...
  35     HTTP request sent, awaiting response... 200 OK
  36     Length: 39814813 (38M) [text/x-vcard]
  37     Saving to: ‘LF5713.vcf’
  38
  39     100% [=================================>] 39,814,813   193KB/s  in 4m 42s
  40
  41     2013-12-10 21:33:54 (138 KB/s) - ‘LF5713.vcf’ saved [39814813/39814813]
  42 p.
  43
  44 # On the shell VM, Make sure that your Arvados environment includes @ARVADOS_API_TOKEN@ and @ARVADOS_API_HOST@ as described in "Tutorial: Your first job":tutorial-job1.html.
  45
  46 # Use the @arv keep@ command to add your VCF data to Keep:
  47 bc.
  48     arv keep put MyData.vcf
  49     9845d870ebe27036ba101a3bee10fb3f+234+K@ant
  50
  51   The string returned by @arv keep@ is a _locator._ It is essentially a filename for data stored in Keep.
  52