X-Git-Url: https://git.arvados.org/arvados.git/blobdiff_plain/3d2c9602f0309c1b25a434c053561c0c98dafbce..81bb05b5a18c9501057876bf5b0e1923778dad26:/doc/user/tutorials/tutorial-keep.html.textile.liquid
diff --git a/doc/user/tutorials/tutorial-keep.html.textile.liquid b/doc/user/tutorials/tutorial-keep.html.textile.liquid
index 243a4834b4..74319fda7d 100644
--- a/doc/user/tutorials/tutorial-keep.html.textile.liquid
+++ b/doc/user/tutorials/tutorial-keep.html.textile.liquid
@@ -1,129 +1,54 @@
---
layout: default
navsection: userguide
-navmenu: Tutorials
-title: "Storing and Retrieving data using Arvados Keep"
-
+title: "Uploading data"
...
-h1. Storing and Retrieving data using Arvados Keep
-
-This tutorial introduces you to the Arvados file storage system.
-
-
-*This tutorial assumes that you are "logged into an Arvados VM instance":{{site.basedoc}}/user/getting_started/ssh-access.html#login, and have a "working environment.":{{site.basedoc}}/user/getting_started/check-environment.html*
-
-The Arvados distributed file system is called *Keep*. Keep is a content-addressable file system. This means that files are managed using special unique identifiers derived from the _contents_ of the file, rather than human-assigned file names (specifically, the md5 hash). This has a number of advantages:
-* Files can be stored and replicated across a cluster of servers without requiring a central name server.
-* Systematic validation of data integrity by both server and client because the checksum is built into the identifier.
-* Minimizes data duplication (two files with the same contents will result in the same identifier, and will not be stored twice.)
-* Avoids data race conditions (an identifier always points to the same data.)
-
-h1. Putting Data into Keep
+This tutorial describes how to to upload new Arvados data collections using the command line tool @arv-put@. This example uses a freely available TSV file containing variant annotations from "Personal Genome Project (PGP)":http://www.pgp-hms.org participant "hu599905.":https://my.pgp-hms.org/profile/hu599905
-We will start with downloading a freely available VCF file from the "Personal Genome Project (PGP)":http://www.personalgenomes.org subject "hu599905":https://my.personalgenomes.org/profile/hu599905 to a staging directory on the VM, and then add it to Keep.
-
-In the following tutorials, replace you
with your user id.
-
-First, log into the Arvados VM instance and set up the staging area:
-
-notextile.
~$ mkdir /scratch/you
-
-Next, download the file:
+notextile. ~$ cd /scratch/you
-/scratch/you$ curl -o var-GS000016015-ASM.tsv.bz2 'https://warehouse.personalgenomes.org/warehouse/f815ec01d5d2f11cb12874ab2ed50daa+234+K@ant/var-GS000016015-ASM.tsv.bz2'
+~$ curl -o var-GS000016015-ASM.tsv.bz2 'https://warehouse.pgp-hms.org/warehouse/f815ec01d5d2f11cb12874ab2ed50daa+234+K@ant/var-GS000016015-ASM.tsv.bz2'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 216M 100 216M 0 0 10.0M 0 0:00:21 0:00:21 --:--:-- 9361k
~$ scp MyData.vcf you@shell.arvados:/scratch/you/MyData.vcf
-
-{% include 'notebox_end' %}
-
-Now use @arv keep put@ to add your VCF data to Keep, then delete the local copy of the file:
-
+# Now upload the file to Keep using @arv-put@:
/scratch/you$ arv keep put var-GS000016015-ASM.tsv.bz2
+~$ arv-put var-GS000016015-ASM.tsv.bz2
+216M / 216M 100.0%
c1bad4b39ca5a924e481008009d94e32+210
-/scratch/you$ rm var-GS000016015-ASM.tsv.bz2
-
-
/scratch/you$ mkdir tmp
-/scratch/you$ echo "hello alice" > tmp/alice.txt
-/scratch/you$ echo "hello bob" > tmp/bob.txt
-/scratch/you$ echo "hello carol" > tmp/carol.txt
-/scratch/you$ arv keep put tmp
-0M / 0M 100.0%
-887cd41e9c613463eab2f0d885c6dd96+83
/scratch/you$ arv keep get c1bad4b39ca5a924e481008009d94e32+210
-. 204e43b8a1185621ca55a94839582e6f+67108864 b9677abbac956bd3e86b1deb28dfac03+67108864 fc15aff2a762b13f521baf042140acec+67108864 323d2a3ce20370c4ca1d3462a344f8fd+25885655 0:227212247:var-GS000016015-ASM.tsv.bz2
-
-204e43b8a1185621ca55a94839582e6f+67108864
, b9677abbac956bd3e86b1deb28dfac03+67108864
, fc15aff2a762b13f521baf042140acec+67108864
, 323d2a3ce20370c4ca1d3462a344f8fd+25885655
.
+notextile. 204e43b8a1185621ca55a94839582e6f+67108864
consists of:
-* the md5 hash @204e43b8a1185621ca55a94839582e6f@ which matches the md5 hash of @block1@
-* a size hint @67108864@ which matches the size of @block1@
+h2(#dir). Putting a directory
-Next, let's use @arv keep get@ to download and reassemble @var-GS000016015-ASM.tsv.bz2@ using the following command:
+If you give @arv-put@ a directory, it will recursively upload the entire directory:
/scratch/you$ arv keep get c1bad4b39ca5a924e481008009d94e32+210/var-GS000016015-ASM.tsv.bz2 .
-
-
-This downloads the file var-GS000016015-ASM.tsv.bz2
described by collection c1bad4b39ca5a924e481008009d94e32+210
from Keep and places it into the local directory. Now that we have the file, we can compute the md5 hash of the complete file:
-
-/scratch/you$ md5sum var-GS000016015-ASM.tsv.bz2
-44b8ae3fde7a8a88d2f7ebd237625b4f var-GS000016015-ASM.tsv.bz2
-
-/scratch/you$ arv keep ls c1bad4b39ca5a924e481008009d94e32+210
-var-GS000016015-ASM.tsv.bz2
-/scratch/you$ arv keep ls -s c1bad4b39ca5a924e481008009d94e32+210
-221887 var-GS000016015-ASM.tsv.bz2
+~$ mkdir tmp
+~$ echo "hello alice" > tmp/alice.txt
+~$ echo "hello bob" > tmp/bob.txt
+~$ echo "hello carol" > tmp/carol.txt
+~$ arv-put tmp
+0M / 0M 100.0%
+887cd41e9c613463eab2f0d885c6dd96+83