--- layout: default navsection: userguide title: "Intro: Keep" navorder: 3 --- # Intro: Keep Keep is a content-addressable storage system. Its semantics are inherently different from the POSIX-like file systems you're used to. Using Keep looks like this: 1. Write data. 2. Receive locator. 3. Use locator to retrieve data. 4. Tag the locator with a symbolic name. By contrast, using POSIX looks like this: 1. Choose locator (*i.e.*, filename). 2. Write data to locator. 3. Use locator to retrieve data. Content addressing provides various benefits, including: * Reduction of unnecessary data duplication * Prevention of race conditions (a given locator always references the same data) * Systematic client- and server-side verification of data integrity * Provenance reporting (when combined with Arvados MapReduce jobs) ### Vocabulary Keep arranges data into **collections** and **data blocks**. A collection is analogous to a directory tree in a POSIX filesystem. It contains subdirectories and filenames, and indicates where to find the data blocks which comprise the files. It is encoded in plain text. A data block contains between 1 byte and 64 MiB of data. Its locator is the MD5 checksum of the data, followed by a plus sign and its size in bytes (encoded as a decimal number). `acbd18db4cc2f85cedef654fccc4a4d8+3` Keep distributes data blocks among the available disks. It also stores multiple copies of each block, so a single disk or node failure does not cause any data to become unreachable. ### No "delete" One of the side effects of the Keep write semantics is the lack of a "delete" operation. Instead, Keep relies on garbage collection to delete unneeded data blocks. ### Tagging valuable data Valuable data must be marked explicitly by creating a Collection in Arvados. Otherwise, the data blocks will be deleted during garbage collection. Use the arv(1) program to create a collection. For example: arv collections create --uuid "acbd18db4cc2f85cedef654fccc4a4d8+3" ## Getting started Write three bytes of data to Keep. echo -n foo | whput - Output: acbd18db4cc2f85cedef654fccc4a4d8+3 Retrieve the data. whget acbd18db4cc2f85cedef654fccc4a4d8+3 Output: foo {% include alert-stub.html %} ### Writing a collection ### Reading a file from a collection ### Adding a collection to Arvados ### Tagging a collection ### Mounting Keep as a read-only POSIX filesystem ### Mounting a single collection as a POSIX filesystem