2764: Add wget-friendly Collections file page.
authorBrett Smith <brett@curoverse.com>
Thu, 15 May 2014 20:38:12 +0000 (16:38 -0400)
committerBrett Smith <brett@curoverse.com>
Wed, 21 May 2014 19:16:35 +0000 (15:16 -0400)
This new route will become the way you share authless Collection links
with others.  They can pass it to `wget -r` to download the whole
collection, nicely organized, with nothing extraneous.  Since it
doesn't try to load user information or look up related Arvados items,
it can be rendered using an API token with a very narrow scope.

Because wget respects robots.txt, this branch stops using that in
favor of the corresponding <meta> tag.  The new view only limits
indexing, so wget can follow the links on the page.

Refs #2764.

apps/workbench/app/controllers/collections_controller.rb
apps/workbench/app/views/collections/show_file_links.html.erb [new file with mode: 0644]
apps/workbench/app/views/layouts/application.html.erb
apps/workbench/public/robots.txt
apps/workbench/test/integration/collections_test.rb
services/api/test/fixtures/api_client_authorizations.yml

index 7a8888e1d230885b7cfd78b430eae4ce74582381..7af552ded9c644e5bec88aaae65efb7d1ae6a07b 100644 (file)
@@ -93,8 +93,7 @@ class CollectionsController < ApplicationController
   def show_file_links
     Thread.current[:reader_tokens] = [params[:reader_token]]
     find_object_by_uuid
-    show
-    render 'show'
+    render layout: false
   end
 
   def show_file
diff --git a/apps/workbench/app/views/collections/show_file_links.html.erb b/apps/workbench/app/views/collections/show_file_links.html.erb
new file mode 100644 (file)
index 0000000..9d61a6a
--- /dev/null
@@ -0,0 +1,36 @@
+<!DOCTYPE html>
+<html>
+<head>
+  <meta charset="utf-8">
+  <title>
+    <% if content_for? :page_title %>
+    <%= yield :page_title %> / <%= Rails.configuration.site_name %>
+    <% else %>
+    <%= Rails.configuration.site_name %>
+    <% end %>
+  </title>
+  <meta name="description" content="">
+  <meta name="author" content="">
+  <meta name="robots" content="NOINDEX">
+</head>
+<body>
+<% content_for :page_title do %>
+  <%= (@object.respond_to?(:properties) ? @object.properties[:page_title] : nil) ||
+        @object.friendly_link_name %>
+<% end %>
+
+<% if @object.andand.files.andand.any? %>
+  <% link_opts = {controller: 'collections', action: 'show_file',
+                  uuid: @object.uuid, reader_token: params[:reader_token]} %>
+  <ul>
+  <% @object.files.map { |spec|
+       CollectionsHelper::file_path(spec)
+     }.each do |path| %>
+    <li><%= link_to(path, link_opts.merge(file: path)) %></li>
+  <% end %>
+  </ul>
+<% else %>
+  <p>No files in this collection.</p>
+<% end %>
+</body>
+</html>
index 660d2dc49721a9f705d087f7f96e6e564c275fc0..a5460c295f7d0ac550f931e92034fec5dfd9f2b6 100644 (file)
@@ -14,6 +14,7 @@
   <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
   <meta name="description" content="">
   <meta name="author" content="">
+  <meta name="robots" content="NOINDEX, NOFOLLOW">
   <%= stylesheet_link_tag    "application", :media => "all" %>
   <%= javascript_include_tag "application" %>
   <%= csrf_meta_tags %>
index c6742d8a8cb8c13dc205d29fb007b28f4ab3bc97..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 100644 (file)
@@ -1,2 +0,0 @@
-User-Agent: *
-Disallow: /
index 4a33693018a8f83137650374b57cf3a4615935d5..911daa02dc9645c882b52c48b125dbe5b2a86b19 100644 (file)
@@ -3,7 +3,6 @@ require 'selenium-webdriver'
 require 'headless'
 
 class CollectionsTest < ActionDispatch::IntegrationTest
-
   def change_persist oldstate, newstate
     find "div[data-persistent-state='#{oldstate}']"
     page.assert_no_selector "div[data-persistent-state='#{newstate}']"
@@ -48,4 +47,30 @@ class CollectionsTest < ActionDispatch::IntegrationTest
     # isn't only showing up in an error message.
     assert(page.has_link?('foo'), "Collection page did not include file link")
   end
+
+  test "can download an entire collection with a reader token" do
+    uuid = api_fixture('collections')['foo_file']['uuid']
+    token = api_fixture('api_client_authorizations')['active_all_collections']['api_token']
+    url_head = "/collections/download/#{uuid}/#{token}/"
+    visit url_head
+    # It seems that Capybara can't inspect tags outside the body, so this is
+    # a very blunt approach.
+    assert_no_match(/<\s*meta[^>]+\bnofollow\b/i, page.html,
+                    "wget prohibited from recursing the collection page")
+    # TODO: When we can test against a Keep server, actually follow links
+    # and check their contents, rather than testing the href directly
+    # (this is too closely tied to implementation details).
+    hrefs = page.all('a').map do |anchor|
+      link = anchor[:href] || ''
+      if link.start_with? url_head
+        link[url_head.size .. -1]
+      elsif link.start_with? '/'
+        nil
+      else
+        link
+      end
+    end
+    assert_equal(['foo'], hrefs.compact.sort,
+                 "download page did provide strictly file links")
+  end
 end
index 9901ec4f038390c6364a6df467260d80afbb51e5..aca6d7a6a1e087b114fe119e8b809d925f9c9363 100644 (file)
@@ -58,6 +58,13 @@ admin_noscope:
   expires_at: 2038-01-01 00:00:00
   scopes: []
 
+active_all_collections:
+  api_client: untrusted
+  user: active
+  api_token: activecollectionsabcdefghijklmnopqrstuvwxyz1234567
+  expires_at: 2038-01-01 00:00:00
+  scopes: ["GET /arvados/v1/collections/"]
+
 active_userlist:
   api_client: untrusted
   user: active