X-Git-Url: https://git.arvados.org/arvados.git/blobdiff_plain/b54a5ea817d3d2087eaa07dcf98ec8a82af56d06..d533fffba8615a3e4d370ed1f7cdd810cdb4eee3:/doc/user/tutorials/tutorial-keep.html.textile.liquid
diff --git a/doc/user/tutorials/tutorial-keep.html.textile.liquid b/doc/user/tutorials/tutorial-keep.html.textile.liquid
index 01ef78ad12..e02082388f 100644
--- a/doc/user/tutorials/tutorial-keep.html.textile.liquid
+++ b/doc/user/tutorials/tutorial-keep.html.textile.liquid
@@ -23,47 +23,52 @@ h1. Putting Data into Keep
We will start with downloading a freely available VCF file from the "Personal Genome Project (PGP)":http://www.personalgenomes.org subject "hu599905":https://my.personalgenomes.org/profile/hu599905 to a staging directory on the VM, and then add it to Keep.
+In the following tutorials, replace you
with your user id.
+
First, log into the Arvados VM instance and set up the staging area:
-notextile.
$ mkdir /scratch/you
+notextile. ~$ mkdir /scratch/you
Next, download the file:
$ mkdir /scratch/you
-$ cd /scratch/you
-$ curl -o var-GS000016015-ASM.tsv.bz2 'https://warehouse.personalgenomes.org/warehouse/f815ec01d5d2f11cb12874ab2ed50daa+234+K@ant/var-GS000016015-ASM.tsv.bz2'
+~$ cd /scratch/you
+/scratch/you$ curl -o var-GS000016015-ASM.tsv.bz2 'https://warehouse.personalgenomes.org/warehouse/f815ec01d5d2f11cb12874ab2ed50daa+234+K@ant/var-GS000016015-ASM.tsv.bz2'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 216M 100 216M 0 0 10.0M 0 0:00:21 0:00:21 --:--:-- 9361k
~$ scp MyData.vcf you@shell.arvados:/scratch/you/MyData.vcf
-notextile. $ scp MyData.vcf you@shell.arvados:/scratch/you/MyData.vcf
+{% include 'notebox_end' %}
-Now use @arv keep put@ to add your VCF data to Keep:
+Now use @arv keep put@ to add your VCF data to Keep, then delete the local copy of the file:
$ cd /scratch/you
-$ arv keep put var-GS000016015-ASM.tsv.bz2
+/scratch/you$ arv keep put var-GS000016015-ASM.tsv.bz2
c1bad4b39ca5a924e481008009d94e32+210
+/scratch/you$ rm var-GS000016015-ASM.tsv.bz2
$ mkdir tmp
-$ echo "hello alice" > tmp/alice.txt
-$ echo "hello bob" > tmp/bob.txt
-$ echo "hello carol" > tmp/carol.txt
-$ arv keep put tmp
+/scratch/you$ mkdir tmp
+/scratch/you$ echo "hello alice" > tmp/alice.txt
+/scratch/you$ echo "hello bob" > tmp/bob.txt
+/scratch/you$ echo "hello carol" > tmp/carol.txt
+/scratch/you$ arv keep put tmp
0M / 0M 100.0%
887cd41e9c613463eab2f0d885c6dd96+83
@@ -78,23 +83,23 @@ In order to reassemble the file, Keep stores a *collection* data block which lis
In this example we will use @c1bad4b39ca5a924e481008009d94e32+210@ which we added to keep in the previous section. First let us examine the contents of this collection using @arv keep get@:
-$ arv keep get c1bad4b39ca5a924e481008009d94e32+210
+/scratch/you$ arv keep get c1bad4b39ca5a924e481008009d94e32+210
. 204e43b8a1185621ca55a94839582e6f+67108864 b9677abbac956bd3e86b1deb28dfac03+67108864 fc15aff2a762b13f521baf042140acec+67108864 323d2a3ce20370c4ca1d3462a344f8fd+25885655 0:227212247:var-GS000016015-ASM.tsv.bz2
-@arv keep get@ fetches the contents of the locator @c1bad4b39ca5a924e481008009d94e32+210@. This is a locator for a collection data block, so it fetches the contents of the collection. In this example, this collection consists of a single file @var-GS000016015-ASM.tsv.bz2@ which is 227212247 bytes long, and is stored using four sequential data blocks, 204e43b8a1185621ca55a94839582e6f+67108864
, b9677abbac956bd3e86b1deb28dfac03+67108864
, fc15aff2a762b13f521baf042140acec+67108864
, 323d2a3ce20370c4ca1d3462a344f8fd+25885655
.
+The command @arv keep get@ fetches the contents of the locator @c1bad4b39ca5a924e481008009d94e32+210@. This is a locator for a collection data block, so it fetches the contents of the collection. In this example, this collection consists of a single file @var-GS000016015-ASM.tsv.bz2@ which is 227212247 bytes long, and is stored using four sequential data blocks, 204e43b8a1185621ca55a94839582e6f+67108864
, b9677abbac956bd3e86b1deb28dfac03+67108864
, fc15aff2a762b13f521baf042140acec+67108864
, 323d2a3ce20370c4ca1d3462a344f8fd+25885655
.
Let's use @arv keep get@ to download the first datablock:
-notextile. $ arv keep get 204e43b8a1185621ca55a94839582e6f+67108864 > block1
+notextile. /scratch/you$ arv keep get 204e43b8a1185621ca55a94839582e6f+67108864 > block1
Let's look at the size and compute the md5 hash of @block1@:
-$ ls -l block1
+/scratch/you$ ls -l block1
-rw-r--r-- 1 you group 67108864 Dec 9 20:14 block1
-$ md5sum block1
+/scratch/you$ md5sum block1
204e43b8a1185621ca55a94839582e6f block1
@@ -105,12 +110,14 @@ Notice that the block identifer 204e43b8a1185621ca55a94839582e6f+67108864<
Next, let's use @arv keep get@ to download and reassemble @var-GS000016015-ASM.tsv.bz2@ using the following command:
-notextile. $ arv keep get c1bad4b39ca5a924e481008009d94e32+210/var-GS000016015-ASM.tsv.bz2 .
+
+/scratch/you$ arv keep get c1bad4b39ca5a924e481008009d94e32+210/var-GS000016015-ASM.tsv.bz2 .
+
This downloads the file @var-GS000016015-ASM.tsv.bz2@ described by collection @c1bad4b39ca5a924e481008009d94e32+210@ from Keep and places it into the local directory. Now that we have the file, we can compute the md5 hash of the complete file:
-$ md5sum var-GS000016015-ASM.tsv.bz2
+/scratch/you$ md5sum var-GS000016015-ASM.tsv.bz2
44b8ae3fde7a8a88d2f7ebd237625b4f var-GS000016015-ASM.tsv.bz2
@@ -120,9 +127,9 @@ h2. Accessing Collections
There are a couple of other ways to access a collection. You may view the contents of a collection using @arv keep ls@:
-$ arv keep ls c1bad4b39ca5a924e481008009d94e32+210
+/scratch/you$ arv keep ls c1bad4b39ca5a924e481008009d94e32+210
var-GS000016015-ASM.tsv.bz2
-$ arv keep ls -s c1bad4b39ca5a924e481008009d94e32+210
+/scratch/you$ arv keep ls -s c1bad4b39ca5a924e481008009d94e32+210
221887 var-GS000016015-ASM.tsv.bz2