X-Git-Url: https://git.arvados.org/arvados.git/blobdiff_plain/10c1e7359286edd6562c52304e9706449a9ee53f..HEAD:/doc/user/topics/arv-copy.html.textile.liquid
diff --git a/doc/user/topics/arv-copy.html.textile.liquid b/doc/user/topics/arv-copy.html.textile.liquid
index f1adfe2854..703f25425f 100644
--- a/doc/user/topics/arv-copy.html.textile.liquid
+++ b/doc/user/topics/arv-copy.html.textile.liquid
@@ -9,103 +9,127 @@ Copyright (C) The Arvados Authors. All rights reserved.
SPDX-License-Identifier: CC-BY-SA-3.0
{% endcomment %}
-{% include 'crunch1only_begin' %}
-On those sites, the "copy a pipeline template" feature described below is not available. However, "copy a workflow" feature is not yet implemented.
-{% include 'crunch1only_end' %}
-
This tutorial describes how to copy Arvados objects from one cluster to another by using @arv-copy@.
{% include 'tutorial_expectations' %}
h2. arv-copy
-@arv-copy@ allows users to copy collections and pipeline templates from one cluster to another. By default, @arv-copy@ will recursively go through a template and copy all dependencies associated with the object.
+@arv-copy@ allows users to copy collections, workflow definitions and projects from one cluster to another. You can also use @arv-copy@ to import resources from HTTP URLs into Keep.
-For example, let's copy from the Arvados playground, also known as *qr1hi*, to *dst_cluster*. The names *qr1hi* and *dst_cluster* are interchangable with any cluster name. You can find the cluster name from the prefix of the uuid of the object you want to copy. For example, in *qr1hi*-4zz18-tci4vn4fa95w0zx, the cluster name is qr1hi.
+For projects, @arv-copy@ will copy all the collections workflow definitions owned by the project, and recursively copy subprojects.
-In order to communicate with both clusters, you must create custom configuration files for each cluster. In the Arvados Workbench, click on the dropdown menu icon in the upper right corner of the top navigation menu to access the user settings menu, and click on the menu item *Current token*. Copy the @ARVADOS_API_HOST@ and @ARVADOS_API_TOKEN@ in both of your clusters. Then, create two configuration files, one for each cluster. The names of the files must have the format of *uuid_prefix.conf*. In our example, let's make two files, one for *qr1hi* and one for *dst_cluster*. From your *Current token* page in *qr1hi* and *dst_cluster*, copy the @ARVADOS_API_HOST@ and @ARVADOS_API_TOKEN@.
+For workflow definitions, @arv-copy@ will recursively go through the workflow and copy all associated dependencies (input collections and Docker images).
-!{display: block;margin-left: 25px;margin-right: auto;}{{ site.baseurl }}/images/api-token-host.png!
+For example, let's copy from the Arvados Playground, also known as *pirca*, to *dstcl*. The names *pirca* and *dstcl* are interchangable with any cluster ID. You can find the cluster ID from the prefix of the UUID of the object you want to copy. For example, in zzzzz-4zz18-tci4vn4fa95w0zx
ClusterID.conf
-~$ cd ~/.config/arvados
-~$ echo "ARVADOS_API_HOST=qr1hi.arvadosapi.com" >> qr1hi.conf
-~$ echo "ARVADOS_API_TOKEN=123456789abcdefghijkl" >> qr1hi.conf
-~$ echo "ARVADOS_API_HOST=dst_cluster.arvadosapi.com" >> dst_cluster.conf
-~$ echo "ARVADOS_API_TOKEN=987654321lkjihgfedcba" >> dst_cluster.conf
-
ARVADOS_API_HOST=pirca.arvadosapi.com
+ARVADOS_API_TOKEN=v2/jutro-gj3su-12345abcde67890/abcdefghijklmnopqrstuvwxyz1234567890
+
If it looks right, save and close the file.
+# Open Workbench for your destination cluster *dstcl*.
+# On the system where you'll run @arv-copy@, start a new file named ~/.config/arvados/dstcl.conf
~$ arv-copy --src qr1hi --dst dst_cluster qr1hi-4zz18-tci4vn4fa95w0zx
-qr1hi-4zz18-tci4vn4fa95w0zx: 6.1M / 6.1M 100.0%
-arvados.arv-copy[1234] INFO: Success: created copy with uuid dst_cluster-4zz18-8765943210cdbae
+~$ arv-copy --src pirca --dst dstcl jutro-4zz18-tv416l321i4r01e
+jutro-4zz18-tv416l321i4r01e: 6.1M / 6.1M 100.0%
+arvados.arv-copy[1234] INFO: Success: created copy with uuid dstcl-4zz18-xxxxxxxxxxxxxxx
~$ arv-copy --src qr1hi --dst dst_cluster --project-uuid dst_cluster-j7d0g-a894213ukjhal12 qr1hi-4zz18-tci4vn4fa95w0zx
+
+~$ arv-copy --src pirca --dst dstcl 2463fa9efeb75e099685528b3b9071e0+438
+2463fa9efeb75e099685528b3b9071e0+438: 6.1M / 6.1M 100.0%
+arvados.arv-copy[1234] INFO: Success: created copy with uuid dstcl-4zz18-xxxxxxxxxxxxxxx
-h3. How to copy a pipeline template
-
-{% include 'arv_copy_expectations' %}
+The output of arv-copy displays the UUID of the collection generated in the destination cluster. By default, the output is placed in your home project in the destination cluster. If you want to place your collection in an existing project, you can specify the project you want it to be in using the tag @--project-uuid@ followed by the project UUID.
-We will use the uuid @qr1hi-p5p6p-9pkaxt6qjnkxhhu@ as an example pipeline template.
+For example, this will copy the collection to project @dstcl-j7d0g-a894213ukjhal12@ in the destination cluster.
-
-~$ arv-copy --src qr1hi --dst dst_cluster --dst-git-repo $USER/tutorial qr1hi-p5p6p-9pkaxt6qjnkxhhu
-To git@git.dst_cluster.arvadosapi.com:$USER/tutorial.git
- * [new branch] git_git_qr1hi_arvadosapi_com_arvados_git_ac21f0d45a76294aaca0c0c0fdf06eb72d03368d -> git_git_qr1hi_arvadosapi_com_arvados_git_ac21f0d45a76294aaca0c0c0fdf06eb72d03368d
-arvados.arv-copy[19694] INFO: Success: created copy with uuid dst_cluster-p5p6p-rym2h5ub9m8ofwj
+ ~$ arv-copy --src pirca --dst dstcl --project-uuid dstcl-j7d0g-a894213ukjhal12 jutro-4zz18-tv416l321i4r01e
-New branches in the destination git repo will be created for each branch used in the pipeline template. For example, if your source branch was named ac21f0d45a76294aaca0c0c0fdf06eb72d03368d, your new branch will be named @git_git_qr1hi_arvadosapi_com_reponame_git_ac21f0d45a76294aaca0c0c0fdf06eb72d03368d@.
+Additionally, if you need to specify the storage classes where to save the copied data on the destination cluster, you can do that by using the @--storage-classes LIST@ argument, where @LIST@ is a comma-separated list of storage class names.
+
+h3. How to copy a workflow
-By default, if you copy a pipeline template recursively, you will find that the template as well as all the dependencies are in your home project.
+Copying workflows requires @arvados-cwl-runner@ to be available in your @$PATH@.
-If you would like to copy the object without dependencies, you can use the @--no-recursive@ tag.
+We will use the UUID @jutro-7fd4e-mkmmq53m1ze6apx@ as an example workflow.
-For example, we can copy the same object using this tag.
+Arv-copy will infer the source cluster is @jutro@ from the object UUID, and destination cluster is @pirca@ from @--project-uuid@.
-~$ arv-copy --src qr1hi --dst dst_cluster --dst-git-repo $USER/tutorial --no-recursive qr1hi-p5p6p-9pkaxt6qjnkxhhu
+~$ arv-copy --project-uuid pirca-j7d0g-ecak8knpefz8ere jutro-7fd4e-mkmmq53m1ze6apx
+ae480c5099b81e17267b7445e35b4bc7+180: 23M / 23M 100.0%
+2463fa9efeb75e099685528b3b9071e0+438: 156M / 156M 100.0%
+jutro-4zz18-vvvqlops0a0kpdl: 94M / 94M 100.0%
+2020-08-19 17:04:13 arvados.arv-copy[4789] INFO:
+2020-08-19 17:04:13 arvados.arv-copy[4789] INFO: Success: created copy with uuid pirca-7fd4e-s0tw9rfbkpo2fmx
-h3. How to copy a workflow
+The name, description, and workflow definition from the original workflow will be used for the destination copy. In addition, any *collections* and *Docker images* referenced in the source workflow definition will also be copied to the destination.
+
+If you would like to copy the object without dependencies, you can use the @--no-recursive@ flag.
+
+h3. How to copy a project
+
+We will use the UUID @jutro-j7d0g-xj19djofle3aryq@ as an example project.
-We will use the uuid @zzzzz-7fd4e-sampleworkflow1@ as an example workflow.
+Arv-copy will infer the source cluster is @jutro@ from the source project UUID, and destination cluster is @pirca@ from @--project-uuid@.
-~$ arv-copy --src zzzzz --dst dst_cluster --dst-git-repo $USER/tutorial zzzzz-7fd4e-sampleworkflow1
-zzzzz-4zz18-jidprdejysravcr: 1143M / 1143M 100.0%
-2017-01-04 04:11:58 arvados.arv-copy[5906] INFO:
-2017-01-04 04:11:58 arvados.arv-copy[5906] INFO: Success: created copy with uuid dst_cluster-7fd4e-ojtgpne594ubkt7
+~$ arv-copy --project-uuid pirca-j7d0g-lr8sq3tx3ovn68k jutro-j7d0g-xj19djofle3aryq
+2021-09-08 21:29:32 arvados.arv-copy[6377] INFO:
+2021-09-08 21:29:32 arvados.arv-copy[6377] INFO: Success: created copy with uuid pirca-j7d0g-ig9gvu5piznducp
-The name, description, and workflow definition from the original workflow will be used for the destination copy. In addition, any *locations* and *docker images* found in the src workflow definition will also be copied to the destination recursively.
+The name and description of the original project will be used for the destination copy. If a project already exists with the same name, collections and workflow definitions will be copied into the project with the same name.
-If you would like to copy the object without dependencies, you can use the @--no-recursive@ flag.
+If you would like to copy the project but not its subproject, you can use the @--no-recursive@ flag.
+
+h3. Importing HTTP resources to Keep
-For example, we can copy the same object non-recursively using the following:
+You can also use @arv-copy@ to copy the contents of a HTTP URL into Keep. When you do this, Arvados keeps track of the original URL the resource came from. This allows you to refer to the resource by its original URL in Workflow inputs, but actually read from the local copy in Keep.
-~$ arv-copy --src zzzzz --dst dst_cluster --dst-git-repo $USER/tutorial --no-recursive zzzzz-7fd4e-sampleworkflow1
+~$ arv-copy --project-uuid tordo-j7d0g-lr8sq3tx3ovn68k https://example.com/index.html
+tordo-4zz18-dhpb6y9km2byb94
+2023-10-06 10:15:36 arvados.arv-copy[374147] INFO: Success: created copy with uuid tordo-4zz18-dhpb6y9km2byb94
+
+In addition, when importing from HTTP URLs, you may provide a different cluster than the destination in @--src@. This tells @arv-copy@ to search the other cluster for a collection associated with that URL, and if found, copy the collection from that cluster instead of downloading from the original URL.
+
+The following @arv-copy@ command line options affect the behavior of HTTP import.
+
+table(table table-bordered table-condensed).
+|_. Option |_. Description |
+|==--varying-url-params== VARYING_URL_PARAMS|A comma separated list of URL query parameters that should be ignored when storing HTTP URLs in Keep.|
+|==--prefer-cached-downloads==|If a HTTP URL is found in Keep, skip upstream URL freshness check (will not notice if the upstream has changed, but also not error if upstream is unavailable).|