X-Git-Url: https://git.arvados.org/arvados.git/blobdiff_plain/18258f6a3762ba7d83b05260b3c22f71423c0373..306bd74955208294cc15b790b189f5f2656a949f:/doc/user/tutorials/tutorial-keep.html.textile.liquid diff --git a/doc/user/tutorials/tutorial-keep.html.textile.liquid b/doc/user/tutorials/tutorial-keep.html.textile.liquid index 01ef78ad12..b06f725ffa 100644 --- a/doc/user/tutorials/tutorial-keep.html.textile.liquid +++ b/doc/user/tutorials/tutorial-keep.html.textile.liquid @@ -1,136 +1,76 @@ --- layout: default navsection: userguide -navmenu: Tutorials -title: "Storing and Retrieving data using Arvados Keep" - +title: "Uploading data" ... -h1. Storing and Retrieving data using Arvados Keep - -This tutorial introduces you to the Arvados file storage system. - - -*This tutorial assumes that you are "logged into an Arvados VM instance":{{site.basedoc}}/user/getting_started/ssh-access.html#login, and have a "working environment.":{{site.basedoc}}/user/getting_started/check-environment.html* +This tutorial describes how to to upload new Arvados data collections using the command line tool @arv keep put@. -The Arvados distributed file system is called *Keep*. Keep is a content-addressable file system. This means that files are managed using special unique identifiers derived from the _contents_ of the file, rather than human-assigned file names (specifically, the md5 hash). This has a number of advantages: -* Files can be stored and replicated across a cluster of servers without requiring a central name server. -* Systematic validation of data integrity by both server and client because the checksum is built into the identifier. -* Minimizes data duplication (two files with the same contents will result in the same identifier, and will not be stored twice.) -* Avoids data race conditions (an identifier always points to the same data.) +notextile.
$ mkdir /scratch/you
-
-Next, download the file:
+h2. Upload using command prompt
+To upload a file to Keep using @arv keep put@:
$ mkdir /scratch/you
-$ cd /scratch/you
-$ curl -o var-GS000016015-ASM.tsv.bz2 'https://warehouse.personalgenomes.org/warehouse/f815ec01d5d2f11cb12874ab2ed50daa+234+K@ant/var-GS000016015-ASM.tsv.bz2'
- % Total % Received % Xferd Average Speed Time Time Time Current
- Dload Upload Total Spent Left Speed
-100 216M 100 216M 0 0 10.0M 0 0:00:21 0:00:21 --:--:-- 9361k
+~$ arv keep put var-GS000016015-ASM.tsv.bz2
+216M / 216M 100.0%
+Collection saved as ...
+qr1hi-4zz18-xxxxxxxxxxxxxxx
$ scp MyData.vcf you@shell.arvados:/scratch/you/MyData.vcf
+The file used in this example is a freely available TSV file containing variant annotations from "Personal Genome Project (PGP)":http://www.pgp-hms.org participant "hu599905.":https://my.pgp-hms.org/profile/hu599905), downloadable "here":https://warehouse.pgp-hms.org/warehouse/f815ec01d5d2f11cb12874ab2ed50daa+234+K@ant/var-GS000016015-ASM.tsv.bz2.
-Now use @arv keep put@ to add your VCF data to Keep:
+$ cd /scratch/you
-$ arv keep put var-GS000016015-ASM.tsv.bz2
-c1bad4b39ca5a924e481008009d94e32+210
+~$ mkdir tmp
+~$ echo "hello alice" > tmp/alice.txt
+~$ echo "hello bob" > tmp/bob.txt
+~$ echo "hello carol" > tmp/carol.txt
+~$ arv keep put tmp
+0M / 0M 100.0%
+Collection saved as ...
+qr1hi-4zz18-yyyyyyyyyyyyyyy
$ mkdir tmp
-$ echo "hello alice" > tmp/alice.txt
-$ echo "hello bob" > tmp/bob.txt
-$ echo "hello carol" > tmp/carol.txt
-$ arv keep put tmp
-0M / 0M 100.0%
-887cd41e9c613463eab2f0d885c6dd96+83
-
-$ arv keep get c1bad4b39ca5a924e481008009d94e32+210
-. 204e43b8a1185621ca55a94839582e6f+67108864 b9677abbac956bd3e86b1deb28dfac03+67108864 fc15aff2a762b13f521baf042140acec+67108864 323d2a3ce20370c4ca1d3462a344f8fd+25885655 0:227212247:var-GS000016015-ASM.tsv.bz2
-
-204e43b8a1185621ca55a94839582e6f+67108864
, b9677abbac956bd3e86b1deb28dfac03+67108864
, fc15aff2a762b13f521baf042140acec+67108864
, 323d2a3ce20370c4ca1d3462a344f8fd+25885655
.
+To move the collection to a different project, check the box at the left of the collection row. Pull down the *Selection...* menu near the top of the page tab, and select *Move selected*. This will open a dialog box where you can select a destination project for the collection. Click a project, then finally the Move button.
-Let's use @arv keep get@ to download the first datablock:
+!{display: block;margin-left: 25px;margin-right: auto;}{{ site.baseurl }}/images/workbench-move-selected.png!
-notextile. $ arv keep get 204e43b8a1185621ca55a94839582e6f+67108864 > block1
+Click on the * Show* button next to the collection's listing on a project page to go to the Workbench page for your collection. On this page, you can see the collection's contents, download individual files, and set sharing options.
-Let's look at the size and compute the md5 hash of @block1@:
+notextile. $ ls -l block1
--rw-r--r-- 1 you group 67108864 Dec 9 20:14 block1
-$ md5sum block1
-204e43b8a1185621ca55a94839582e6f block1
-
-204e43b8a1185621ca55a94839582e6f+67108864
of:
-* the md5 hash @204e43b8a1185621ca55a94839582e6f@ which matches the md5 hash of @block1@
-* a size hint @67108864@ which matches the size of @block1@
+To upload using Workbench, visit the Workbench *Dashboard*. Click on *Projects* dropdown menu in the top navigation menu and select your *Home* project or any other project of your choosing. You will see the *Data collections* tab for this project, which lists the collections in this project.
-Next, let's use @arv keep get@ to download and reassemble @var-GS000016015-ASM.tsv.bz2@ using the following command:
+To upload a file, click on *Add data* dropdown menu and select *Upload files from my computer*.
-notextile. $ arv keep get c1bad4b39ca5a924e481008009d94e32+210/var-GS000016015-ASM.tsv.bz2 .
+!{display: block;margin-left: 25px;margin-right: auto;border:1px solid lightgray;}{{ site.baseurl }}/images/upload-using-workbench.png!
-This downloads the file @var-GS000016015-ASM.tsv.bz2@ described by collection @c1bad4b39ca5a924e481008009d94e32+210@ from Keep and places it into the local directory. Now that we have the file, we can compute the md5 hash of the complete file:
+$ md5sum var-GS000016015-ASM.tsv.bz2
-44b8ae3fde7a8a88d2f7ebd237625b4f var-GS000016015-ASM.tsv.bz2
-
-$ arv keep ls c1bad4b39ca5a924e481008009d94e32+210
-var-GS000016015-ASM.tsv.bz2
-$ arv keep ls -s c1bad4b39ca5a924e481008009d94e32+210
-221887 var-GS000016015-ASM.tsv.bz2
-
-