14440: Comments on how federated.cwl works

author Peter Amstutz <pamstutz@veritasgenetics.com>

Wed, 21 Nov 2018 21:23:59 +0000 (16:23 -0500)

committer Peter Amstutz <pamstutz@veritasgenetics.com>

Mon, 26 Nov 2018 15:44:53 +0000 (10:44 -0500)
author Peter Amstutz <pamstutz@veritasgenetics.com>
Wed, 21 Nov 2018 21:23:59 +0000 (16:23 -0500)
committer Peter Amstutz <pamstutz@veritasgenetics.com>
Mon, 26 Nov 2018 15:44:53 +0000 (10:44 -0500)
diff --git a/.licenseignore b/.licenseignore

index 113bf4fa4e5f2196f74a9609d4805117392d85a3..ba4c2dc38bec8b7c233ae4560c45de281b47a52e 100644 (file)
--- a/.licenseignore
+++ b/.licenseignore
@@ -13,6 +13,7 @@ build/package-test-dockerfiles/ubuntu1604/etc-apt-preferences.d-arvados
  *by-sa-3.0.txt
  *COPYING
  doc/fonts/*
+doc/user/cwl/federated/*
  */docker_image
  docker/jobs/apt.arvados.org.list
  */en.bootstrap.yml
diff --git a/doc/user/cwl/cwl-extensions.html.textile.liquid b/doc/user/cwl/cwl-extensions.html.textile.liquid

index f9ecf7a5343b6210ceaf613c796af535a114adb1..c75b43a641302460cac37df1f05239f94fb2fb15 100644 (file)
--- a/doc/user/cwl/cwl-extensions.html.textile.liquid
+++ b/doc/user/cwl/cwl-extensions.html.textile.liquid
@@ -43,6 +43,9 @@ hints:
    arv:WorkflowRunnerResources:
      ramMin: 2048
      coresMin: 2
+  arv:ClusterTarget:
+    cluster_id: clsr1
+    project_uuid: clsr1-j7d0g-qxc4jcji7n4lafx
  </pre>
  
  The one exception to this is @arv:APIRequirement@, see note below.
@@ -134,3 +137,12 @@ table(table table-bordered table-condensed).
  |_. Field |_. Type |_. Description |
  |ramMin|int|RAM, in mebibytes, to reserve for the arvados-cwl-runner process. Default 1 GiB|
  |coresMin|int|Number of cores to reserve to the arvados-cwl-runner process. Default 1 core.|
+
+h2(#clustertarget). arv:ClusterTarget
+
+Specify which Arvados cluster should execute a container or subworkflow, and the parent project for the container request.
+
+table(table table-bordered table-condensed).
+|_. Field |_. Type |_. Description |
+|cluster_id|string|The five-character alphanumeric cluster id (uuid prefix) where a container or subworkflow will execute.  May be an expression.|
+|project_uuid|string|The uuid of the project which will own container request and output of the container.  May be an expression.|
diff --git a/doc/user/cwl/federated-workflows.html.textile.liquid b/doc/user/cwl/federated-workflows.html.textile.liquid

index 5f96702d74ee193a25ae1c860a0f6441bc04c920..77155fed51e1ef7d3217fd12c4af92174c3010c7 100644 (file)
--- a/doc/user/cwl/federated-workflows.html.textile.liquid
+++ b/doc/user/cwl/federated-workflows.html.textile.liquid
@@ -9,16 +9,30 @@ Copyright (C) The Arvados Authors. All rights reserved.
  SPDX-License-Identifier: CC-BY-SA-3.0
  {% endcomment %}
  
-To support running analysis on geographically dispersed data (avoiding expensive data transfers by sending the computation to the data) and "hybrid cloud" configurations where an on-premise cluster can expand its capabilities by delegating work to a cloud-base cluster, Arvados supports federated workflows.  In a federated workflow, different steps of a workflow may execute on different clusters.  Arvados manages data transfer and delegation of credentials, so this as easy as simply adding cluster target hints to your existing workflow.
+To support running analysis on geographically dispersed data (avoiding expensive data transfers by sending the computation to the data) and "hybrid cloud" configurations where an on-premise cluster can expand its capabilities by delegating work to a cloud-base cluster, Arvados supports federated workflows.  In a federated workflow, different steps of a workflow may execute on different clusters.  Arvados manages data transfer and delegation of credentials, all that is required is adding "arv:ClusterTarget":cwl-extensions.html#clustertarget hints to your existing workflow.
  
-h2. Federated scatter/gather example
+!(full-width)federated-workflow.svg!
+
+h2. Get the example files
+
+The tutorial files are located in the "documentation section of the Arvados source repository:":https://github.com/curoverse/arvados/tree/master/doc/user/cwl/federated
  
+<notextile>
+<pre><code>~$ <span class="userinput">git clone https://github.com/curoverse/arvados</span>
+~$ <span class="userinput">cd arvados/doc/user/cwl/federated</span>
+</code></pre>
+</notextile>
  
+h2. Federated scatter/gather example
+
+In this following example, an analysis task is executed on three different clusters with different data, then the results are combined to produce the final output.
  
  {% codeblock as yaml %}
  {% include 'federated_cwl' %}
  {% endcodeblock %}
  
+Example input document:
+
  {% codeblock as yaml %}
  {% include 'shards_yml' %}
  {% endcodeblock %}
diff --git a/doc/user/cwl/federated/federated.cwl b/doc/user/cwl/federated/federated.cwl

index 250f6d1c51d8ea6da274accd925b3189e8f5a1de..5314a7675b2e6f64c08351cec9e2ccb893a77bab 100644 (file)
--- a/doc/user/cwl/federated/federated.cwl
+++ b/doc/user/cwl/federated/federated.cwl
@@ -1,13 +1,27 @@
+#
+# Demonstrate Arvados federation features.  This performs a parallel
+# scatter over some arbitrary number of files and federated clusters,
+# then joins the results.
+#
  cwlVersion: v1.0
  class: Workflow
  $namespaces:
+  # When using Arvados extensions to CWL, must declare the 'arv' namespace
    arv: "http://arvados.org/cwl#"
+
  requirements:
    InlineJavascriptRequirement: {}
-  DockerRequirement:
-    dockerPull: arvados/fed-test:scatter-gather
    ScatterFeatureRequirement: {}
    StepInputExpressionRequirement: {}
+
+  DockerRequirement:
+    # Replace this with your own Docker container
+    dockerPull: arvados/jobs
+
+  # Define a record type so we can conveniently associate the input
+  # file, the cluster on which the file lives, and the project on that
+  # cluster that will own the container requests and intermediate
+  # outputs.
    SchemaDefRequirement:
      types:
        - name: FileOnCluster
@@ -16,27 +30,56 @@ requirements:
            file: File
            cluster: string
            project: string
+
  inputs:
+  # Expect an array of FileOnCluster records (defined above)
+  # as our input.
    shards:
      type:
        type: array
        items: FileOnCluster
+
  outputs:
+  # Will produce an output file with the results of the distributed
+  # analysis jobs joined together.
    joined:
      type: File
      outputSource: gather-results/joined
+
  steps:
    distributed-analysis:
      in:
-      shards: shards
-      inp: {valueFrom: $(inputs.shards.file)}
-    scatter: shards
+      # Take "shards" array as input, we scatter over it below.
+      shard: shards
+
+      # Use an expression to extract the "file" field to assign to the
+      # "inp" parameter of the tool.
+      inp: {valueFrom: $(inputs.shard.file)}
+
+    # Scatter over shards, this means creating a parallel job for each
+    # element in the "shards" array.  Expressions are evaluated for
+    # each element.
+    scatter: shard
+
+    # Specify the cluster target for this job.  This means each
+    # separate scatter job will execute on the cluster that was
+    # specified in the "cluster" field.
+    #
+    # Arvados handles streaming data between clusters, for example,
+    # the Docker image containing the code for a particular tool will
+    # be fetched on demand, as long as it is available somewhere in
+    # the federation.
      hints:
        arv:ClusterTarget:
-        cluster_id: $(inputs.shards.cluster)
-        project_uuid: $(inputs.shards.project)
+        cluster_id: $(inputs.shard.cluster)
+        project_uuid: $(inputs.shard.project)
+
      out: [out]
      run: md5sum.cwl
+
+  # Collect the results of the distributed step and join them into a
+  # single output file.  Arvados handles streaming inputs,
+  # intermediate results, and outputs between clusters on demand.
    gather-results:
      in:
        inp: distributed-analysis/out
author	Peter Amstutz <pamstutz@veritasgenetics.com>
	Wed, 21 Nov 2018 21:23:59 +0000 (16:23 -0500)
committer	Peter Amstutz <pamstutz@veritasgenetics.com>
	Mon, 26 Nov 2018 15:44:53 +0000 (10:44 -0500)
.licenseignore		patch \| blob \| history
doc/user/cwl/cwl-extensions.html.textile.liquid		patch \| blob \| history
doc/user/cwl/federated-workflows.html.textile.liquid		patch \| blob \| history
doc/user/cwl/federated/federated.cwl		patch \| blob \| history