X-Git-Url: https://git.arvados.org/arvados.git/blobdiff_plain/392c382ddaf8ea4c0c4b6655c7f508de73274d12..ca493dcca8463dc5976b31de0b0dfed3c4d26d9b:/doc/user/tutorials/tutorial-keep.html.textile.liquid?ds=sidebyside
diff --git a/doc/user/tutorials/tutorial-keep.html.textile.liquid b/doc/user/tutorials/tutorial-keep.html.textile.liquid
index 5a5e8796cb..ada6d1fbab 100644
--- a/doc/user/tutorials/tutorial-keep.html.textile.liquid
+++ b/doc/user/tutorials/tutorial-keep.html.textile.liquid
@@ -1,166 +1,56 @@
---
layout: default
navsection: userguide
-title: "Storing and Retrieving data using Keep"
+title: "Uploading data"
...
-This tutorial introduces you to the Arvados file storage system.
+This tutorial describes how to to upload new Arvados data collections using the command line tool @arv keep put@.
+notextile.
-*This tutorial assumes that you are "logged into an Arvados VM instance":{{site.baseurl}}/user/getting_started/ssh-access.html#login, and have a "working environment.":{{site.baseurl}}/user/getting_started/check-environment.html*
+{% include 'tutorial_expectations' %}
-The Arvados distributed file system is called *Keep*. Keep is a content-addressable file system. This means that files are managed using special unique identifiers derived from the _contents_ of the file, rather than human-assigned file names (specifically, the md5 hash). This has a number of advantages:
-* Files can be stored and replicated across a cluster of servers without requiring a central name server.
-* Systematic validation of data integrity by both server and client because the checksum is built into the identifier.
-* Minimizes data duplication (two files with the same contents will result in the same identifier, and will not be stored twice.)
-* Avoids data race conditions (an identifier always points to the same data.)
-
-h1. Putting Data into Keep
-
-We will start with downloading a freely available VCF file from the "Personal Genome Project (PGP)":http://www.personalgenomes.org subject "hu599905":https://my.personalgenomes.org/profile/hu599905 to a staging directory on the VM, and then add it to Keep.
-
-In the following tutorials, replace
you
with your user id.
-
-First, log into the Arvados VM instance and set up the staging area:
-
-notextile.
~$ mkdir /scratch/you
-
-Next, download the file:
+h3. Upload
+To upload a file to Keep using @arv keep put@:
-~$ cd /scratch/you
-/scratch/you$ curl -o var-GS000016015-ASM.tsv.bz2 'https://warehouse.personalgenomes.org/warehouse/f815ec01d5d2f11cb12874ab2ed50daa+234+K@ant/var-GS000016015-ASM.tsv.bz2'
- % Total % Received % Xferd Average Speed Time Time Time Current
- Dload Upload Total Spent Left Speed
-100 216M 100 216M 0 0 10.0M 0 0:00:21 0:00:21 --:--:-- 9361k
+~$ arv keep put var-GS000016015-ASM.tsv.bz2
+216M / 216M 100.0%
+Collection saved as ...
+qr1hi-4zz18-xxxxxxxxxxxxxxx
-{% include 'notebox_begin' %}
-
-If you have your own data, for example @MyData.vcf@, you can use @scp@ or @rsync@ to copy from your local workstation to the shell VM (run this on your local workstation):
+The output value @qr1hi-4zz18-xxxxxxxxxxxxxxx@ is the uuid of the Arvados collection created.
-notextile.
~$ scp MyData.vcf you@shell.arvados:/scratch/you/MyData.vcf
+The file used in this example is a freely available TSV file containing variant annotations from "Personal Genome Project (PGP)":http://www.pgp-hms.org participant "hu599905.":https://my.pgp-hms.org/profile/hu599905), downloadable "here":https://warehouse.pgp-hms.org/warehouse/f815ec01d5d2f11cb12874ab2ed50daa+234+K@ant/var-GS000016015-ASM.tsv.bz2.
-{% include 'notebox_end' %}
-
-Now use @arv keep put@ to add your VCF data to Keep, then delete the local copy of the file:
+
It is also possible to upload an entire directory with @arv keep put@:
-/scratch/you$ arv keep put var-GS000016015-ASM.tsv.bz2
-c1bad4b39ca5a924e481008009d94e32+210
-/scratch/you$ rm var-GS000016015-ASM.tsv.bz2
+~$ mkdir tmp
+~$ echo "hello alice" > tmp/alice.txt
+~$ echo "hello bob" > tmp/bob.txt
+~$ echo "hello carol" > tmp/carol.txt
+~$ arv keep put tmp
+0M / 0M 100.0%
+Collection saved as ...
+qr1hi-4zz18-yyyyyyyyyyyyyyy
-The output value @c1bad4b39ca5a924e481008009d94e32+210@ from @arv keep put@ is the Keep locator. This enables you to access the file you just uploaded, and is explained in the next section.
-
-h2(#dir). Putting a directory
-
-You can also use @arv keep put@ to add an entire directory:
-
-
-/scratch/you$ mkdir tmp
-/scratch/you$ echo "hello alice" > tmp/alice.txt
-/scratch/you$ echo "hello bob" > tmp/bob.txt
-/scratch/you$ echo "hello carol" > tmp/carol.txt
-/scratch/you$ arv keep put tmp
-0M / 0M 100.0%
-887cd41e9c613463eab2f0d885c6dd96+83
-
-
+In both examples, the @arv keep put@ command created a collection. The first collection contains the single uploaded file. The second collection contains the entire uploaded directory.
-The locator @887cd41e9c613463eab2f0d885c6dd96+83@ represents a collection with multiple files.
+@arv keep put@ accepts quite a few optional command line arguments, which are described "on the arv subcommands":{{site.baseurl}}/sdk/cli/subcommands.html#arv-keep-put page.
-h1. Getting Data from Keep
+h3. Locate your collection in Workbench
-h2. Using Workbench
+Visit the Workbench *Dashboard*. Click on *Projects*
dropdown menu in the top navigation menu, select your *Home* project. Your newly uploaded collection should appear near the top of the *Data collections* tab. The collection locator printed by @arv keep put@ will appear under the *name* column.
-You may access collections through the "Collections section of Arvados Workbench":https://{{ site.arvados_workbench_host }}/collections located at "https://{{ site.arvados_workbench_host }}/collections":https://{{ site.arvados_workbench_host }}/collections . You can also access individual collections and individual files within a collection. Some examples:
+To move the collection to a different project, check the box at the left of the collection row. Pull down the *Selection...*
menu near the top of the page tab, and select *Move selected*. This will open a dialog box where you can select a destination project for the collection. Click a project, then finally the
Move button.
-* "https://{{ site.arvados_workbench_host }}/collections/c1bad4b39ca5a924e481008009d94e32+210":https://{{ site.arvados_workbench_host }}/collections/c1bad4b39ca5a924e481008009d94e32+210
-* "https://{{ site.arvados_workbench_host }}/collections/887cd41e9c613463eab2f0d885c6dd96+83/alice.txt":https://{{ site.arvados_workbench_host }}/collections/887cd41e9c613463eab2f0d885c6dd96+83/alice.txt
-
-h2(#arv-get). Using arv-get
-
-You can view the contents of a collection using @arv keep ls@:
-
-
-/scratch/you$ arv keep ls c1bad4b39ca5a924e481008009d94e32+210
-var-GS000016015-ASM.tsv.bz2
-
-
-/scratch/you$ arv keep ls 887cd41e9c613463eab2f0d885c6dd96+83
-alice.txt
-bob.txt
-carol.txt
-
-
-
-Use @-s@ to print file sizes rounded up to the nearest kilobyte:
-
-
-/scratch/you$ arv keep ls -s c1bad4b39ca5a924e481008009d94e32+210
-221887 var-GS000016015-ASM.tsv.bz2
-
-
-
-Use @arv keep get@ to download the contents of a collection and place it in the directory specified in the second argument (in this example, @.@ for the current directory):
-
-
-/scratch/you$ arv keep get c1bad4b39ca5a924e481008009d94e32+210/ .
-
-
-
-You can also download individual files:
-
-
-/scratch/you$ arv keep get 887cd41e9c613463eab2f0d885c6dd96+83/alice.txt .
-
-
-
-With a local copy of the file, we can do some computation, for example computing the md5 hash of the complete file:
-
-
-/scratch/you$ md5sum var-GS000016015-ASM.tsv.bz2
-44b8ae3fde7a8a88d2f7ebd237625b4f var-GS000016015-ASM.tsv.bz2
-
-
-
-h2. Using arv-mount
-
-Use @arv-mount@ to take advantage of the "File System in User Space / FUSE":http://fuse.sourceforge.net/ feature of the Linux kernel to mount a Keep collection as if it were a regular directory tree.
-
-
-/scratch/you$ mkdir mnt
-/scratch/you$ arv-mount --collection c1bad4b39ca5a924e481008009d94e32+210 mnt &
-/scratch/you$ cd mnt
-/scratch/you/mnt$ ls
-var-GS000016015-ASM.tsv.bz2
-/scratch/you/mnt$ md5sum var-GS000016015-ASM.tsv.bz2
-44b8ae3fde7a8a88d2f7ebd237625b4f var-GS000016015-ASM.tsv.bz2
-/scratch/you/mnt$ cd ..
-/scratch/you$ fusermount -u mnt
-
-
-
-You can also mount the entire Keep namespace in "magic directory" mode:
-
-
-/scratch/you$ mkdir mnt
-/scratch/you$ arv-mount mnt &
-/scratch/you$ cd mnt/c1bad4b39ca5a924e481008009d94e32+210
-/scratch/you/mnt/c1bad4b39ca5a924e481008009d94e32+210$ ls
-var-GS000016015-ASM.tsv.bz2
-/scratch/you/mnt/c1bad4b39ca5a924e481008009d94e32+210$ md5sum var-GS000016015-ASM.tsv.bz2
-44b8ae3fde7a8a88d2f7ebd237625b4f var-GS000016015-ASM.tsv.bz2
-/scratch/you/mnt/c1bad4b39ca5a924e481008009d94e32+210$ cd ../..
-/scratch/you$ fusermount -u mnt
-
-
+!{{ site.baseurl }}/images/workbench-move-selected.png!
-Using @arv-mount@ has several significant benefits:
+Click on the *
Show* button next to the collection's listing on a project page to go to the Workbench page for your collection. On this page, you can see the collection's contents, download individual files, and set sharing options.
-* You can browse, open and read Keep entries as if they are regular files.
-* It is easy for existing tools to access files in Keep.
-* Data is downloaded on demand, it is not necessary to download an entire file or collection to start processing
+notextile.