X-Git-Url: https://git.arvados.org/arvados.git/blobdiff_plain/99f067d87504c538932432aa5a8b7c62bfa244bf..d5ba0e97f8522ba3ce6ad36edf099c661a43f6b7:/doc/user/tutorials/tutorial-keep.textile diff --git a/doc/user/tutorials/tutorial-keep.textile b/doc/user/tutorials/tutorial-keep.textile index 166278e657..6683498e86 100644 --- a/doc/user/tutorials/tutorial-keep.textile +++ b/doc/user/tutorials/tutorial-keep.textile @@ -1,11 +1,12 @@ --- layout: default navsection: userguide +navmenu: Tutorials title: "Storing and Retrieving data using Arvados Keep" -navorder: 111 +navorder: 11 --- -h1. Tutorial: Storing and Retrieving data using Arvados Keep +h1. Storing and Retrieving data using Arvados Keep This tutorial introduces you to the Arvados file storage system. @@ -20,29 +21,53 @@ The Arvados distributed file system is called *Keep*. Keep is a content-address h1. Putting Data into Keep -We will start with downloading a freely available VCF exome from the "Personal Genome Project (PGP)":http://www.personalgenomes.org subject "hu599905":https://my.personalgenomes.org/profile/hu599905 and add it to Keep. From an Arvados VM instance: +We will start with downloading a freely available VCF file from the "Personal Genome Project (PGP)":http://www.personalgenomes.org subject "hu599905":https://my.personalgenomes.org/profile/hu599905 to a staging directory on the VM, and then add it to Keep. + +First, log into the Arvados VM instance and set up the staging area: + +notextile.
$ mkdir /scratch/you
+ +Next, download the file: -
$ curl -o var-GS000016015-ASM.tsv.bz2 'https://warehouse.personalgenomes.org/warehouse/f815ec01d5d2f11cb12874ab2ed50daa+234+K@ant/var-GS000016015-ASM.tsv.bz2'
+
$ mkdir /scratch/you
+$ cd /scratch/you
+$ curl -o var-GS000016015-ASM.tsv.bz2 'https://warehouse.personalgenomes.org/warehouse/f815ec01d5d2f11cb12874ab2ed50daa+234+K@ant/var-GS000016015-ASM.tsv.bz2'
   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                  Dload  Upload   Total   Spent    Left  Speed
 100  216M  100  216M    0     0  10.0M      0  0:00:21  0:00:21 --:--:-- 9361k
 
-Alternately, if you have your own data, for example @MyExome.vcf@, you can use @rsync@ on your local computer to copy it to the shell VM: +Alternately, if you have your own data, for example @MyData.vcf@, you can use @scp@ or @rsync@ to copy from your local workstation to the shell VM (run this on your local workstation): -notextile.
$ rsync MyExome.vcf shell.qr1hi:MyExome.vcf
+notextile.
$ scp MyData.vcf you@shell.arvados:/scratch/you/MyData.vcf
Now use @arv keep put@ to add your VCF data to Keep: -
$ arv keep put var-GS000016015-ASM.tsv.bz2
-33a9f3842b01ea3fdf27cc582f5ea2af
+
$ cd /scratch/you
+$ arv keep put var-GS000016015-ASM.tsv.bz2
+c1bad4b39ca5a924e481008009d94e32+210
 
-The output value @33a9f3842b01ea3fdf27cc582f5ea2af@ is the Keep locator. This enables you to access the file you just uploaded, and is explained in the next section. +The output value @c1bad4b39ca5a924e481008009d94e32+210@ is the Keep locator. This enables you to access the file you just uploaded, and is explained in the next section. + +h2. Putting a directory + +You can also use @arv keep put@ to add an entire directory: + + +
$ mkdir tmp
+$ echo "hello alice" > tmp/alice.txt
+$ echo "hello bob" > tmp/bob.txt
+$ echo "hello carol" > tmp/carol.txt
+$ arv keep put tmp
+0M / 0M 100.0% 
+887cd41e9c613463eab2f0d885c6dd96+83
+
+
h1. Getting Data from Keep @@ -97,9 +122,13 @@ There are a couple of other ways to access a collection. You may view the conte
$ arv keep ls c1bad4b39ca5a924e481008009d94e32+210
 var-GS000016015-ASM.tsv.bz2
+$ arv keep ls -s c1bad4b39ca5a924e481008009d94e32+210
+221887 var-GS000016015-ASM.tsv.bz2
 
+* @-s@ prints file sizes in kilobytes + You may also access through the Arvados Workbench using a URI similar to this, where the last part of the path is the Keep locator: "https://workbench.{{ site.arvados_api_host }}/collections/c1bad4b39ca5a924e481008009d94e32+210":https://workbench.{{ site.arvados_api_host }}/collections/c1bad4b39ca5a924e481008009d94e32+210