X-Git-Url: https://git.arvados.org/arvados.git/blobdiff_plain/4b2ab09e3ee91cb63ae42a21d0efb004c053af8a..05fed3ab8bde60cb37f63bae5ca0940ff040e7ca:/doc/user/tutorials/tutorial-keep.html.textile.liquid?ds=sidebyside diff --git a/doc/user/tutorials/tutorial-keep.html.textile.liquid b/doc/user/tutorials/tutorial-keep.html.textile.liquid index 3546c5c101..21efc475c5 100644 --- a/doc/user/tutorials/tutorial-keep.html.textile.liquid +++ b/doc/user/tutorials/tutorial-keep.html.textile.liquid @@ -1,165 +1,96 @@ --- layout: default navsection: userguide -title: "Storing and Retrieving data using Keep" +title: "Uploading data" ... +{% comment %} +Copyright (C) The Arvados Authors. All rights reserved. -This tutorial introduces you to the Arvados file storage system. +SPDX-License-Identifier: CC-BY-SA-3.0 +{% endcomment %} -{% include 'tutorial_expectations' %} +Arvados Data collections can be uploaded using either Workbench or the @arv-put@ command line tool. -The Arvados distributed file system is called *Keep*. Keep is a content-addressable file system. This means that files are managed using special unique identifiers derived from the _contents_ of the file, rather than human-assigned file names (specifically, the MD5 hash). This has a number of advantages: -* Files can be stored and replicated across a cluster of servers without requiring a central name server. -* Both the server and client systematically validate data integrity because the checksum is built into the identifier. -* Data duplication is minimizedâtwo files with the same contents will have in the same identifier, and will not be stored twice. -* It avoids data race conditions, since an identifier always points to the same data. +# "*Upload using Workbench*":#upload-using-workbench +# "*Creating projects*":#creating-projects +# "*Upload using command line tool*":#upload-using-command -h1. Putting Data into Keep +h2(#upload-using-workbench). Upload using Workbench -We will start by downloading a freely available VCF file from "Personal Genome Project (PGP)":http://www.personalgenomes.org subject "hu599905":https://my.personalgenomes.org/profile/hu599905 to a staging directory on the VM, and adding it to Keep. In the following commands, replace *@you@* with your login name. +To upload using Workbench, visit the Workbench *Dashboard*. Click on *Projects* dropdown menu in the top navigation menu and select your *Home* project or any other project of your choosing. You will see the *Data collections* tab for this project, which lists the collections in this project. -First, log into your Arvados VM and set up the staging area: +To upload files into a new collection, click on *Add data* dropdown menu and select *Upload files from my computer*. -notextile.
~$ mkdir /scratch/you
+!{display: block;margin-left: 25px;margin-right: auto;border:1px solid lightgray;}{{ site.baseurl }}/images/upload-using-workbench.png!
-Next, download the file:
+~$ cd /scratch/you
-/scratch/you$ curl -o var-GS000016015-ASM.tsv.bz2 'https://warehouse.personalgenomes.org/warehouse/f815ec01d5d2f11cb12874ab2ed50daa+234+K@ant/var-GS000016015-ASM.tsv.bz2'
- % Total % Received % Xferd Average Speed Time Time Time Current
- Dload Upload Total Spent Left Speed
-100 216M 100 216M 0 0 10.0M 0 0:00:21 0:00:21 --:--:-- 9361k
-
-~$ scp MyData.vcf you@shell.arvados:/scratch/you/MyData.vcf
+*Note:* If you leave the collection page during the upload, the upload process will be aborted and you will need to upload the files again.
-{% include 'notebox_end' %}
+*Note:* You can also use the Upload tab to add additional files to an existing collection.
-Now use @arv keep put@ to add your VCF data to Keep, then delete the local copy of the file:
+notextile. /scratch/you$ arv keep put var-GS000016015-ASM.tsv.bz2
-c1bad4b39ca5a924e481008009d94e32+210
-/scratch/you$ rm var-GS000016015-ASM.tsv.bz2
-
-/scratch/you$ mkdir tmp
-/scratch/you$ echo "hello alice" > tmp/alice.txt
-/scratch/you$ echo "hello bob" > tmp/bob.txt
-/scratch/you$ echo "hello carol" > tmp/carol.txt
-/scratch/you$ arv keep put tmp
-0M / 0M 100.0%
-887cd41e9c613463eab2f0d885c6dd96+83
-
-/scratch/you$ arv keep ls c1bad4b39ca5a924e481008009d94e32+210
-var-GS000016015-ASM.tsv.bz2
-
-
-/scratch/you$ arv keep ls 887cd41e9c613463eab2f0d885c6dd96+83
-alice.txt
-bob.txt
-carol.txt
+~$ arv-put var-GS000016015-ASM.tsv.bz2
+216M / 216M 100.0%
+Collection saved as ...
+zzzzz-4zz18-xxxxxxxxxxxxxxx
/scratch/you$ arv keep ls -s c1bad4b39ca5a924e481008009d94e32+210
-221887 var-GS000016015-ASM.tsv.bz2
-
-/scratch/you$ arv keep get c1bad4b39ca5a924e481008009d94e32+210/ .
-/scratch/you$ ls var-GS000016015-ASM.tsv.bz2
-var-GS000016015-ASM.tsv.bz2
-
-/scratch/you$ arv keep get 887cd41e9c613463eab2f0d885c6dd96+83/alice.txt .
+~$ mkdir tmp
+~$ echo "hello alice" > tmp/alice.txt
+~$ echo "hello bob" > tmp/bob.txt
+~$ echo "hello carol" > tmp/carol.txt
+~$ arv-put tmp
+0M / 0M 100.0%
+Collection saved as ...
+zzzzz-4zz18-yyyyyyyyyyyyyyy
/scratch/you$ md5sum var-GS000016015-ASM.tsv.bz2
-44b8ae3fde7a8a88d2f7ebd237625b4f var-GS000016015-ASM.tsv.bz2
-
-/scratch/you$ mkdir -p mnt
-/scratch/you$ arv-mount --collection c1bad4b39ca5a924e481008009d94e32+210 mnt &
-/scratch/you$ cd mnt
-/scratch/you/mnt$ ls
-var-GS000016015-ASM.tsv.bz2
-/scratch/you/mnt$ md5sum var-GS000016015-ASM.tsv.bz2
-44b8ae3fde7a8a88d2f7ebd237625b4f var-GS000016015-ASM.tsv.bz2
-/scratch/you/mnt$ cd ..
-/scratch/you$ fusermount -u mnt
-
-/scratch/you$ mkdir -p mnt
-/scratch/you$ arv-mount mnt &
-/scratch/you$ cd mnt/c1bad4b39ca5a924e481008009d94e32+210
-/scratch/you/mnt/c1bad4b39ca5a924e481008009d94e32+210$ ls
-var-GS000016015-ASM.tsv.bz2
-/scratch/you/mnt/c1bad4b39ca5a924e481008009d94e32+210$ md5sum var-GS000016015-ASM.tsv.bz2
-44b8ae3fde7a8a88d2f7ebd237625b4f var-GS000016015-ASM.tsv.bz2
-/scratch/you/mnt/c1bad4b39ca5a924e481008009d94e32+210$ cd ../..
-/scratch/you$ fusermount -u mnt
-
-