Merge branch '15319-api-useful-stacktraces'
authorLucas Di Pentima <ldipentima@veritasgenetics.com>
Mon, 15 Jul 2019 20:59:56 +0000 (17:59 -0300)
committerLucas Di Pentima <ldipentima@veritasgenetics.com>
Mon, 15 Jul 2019 20:59:56 +0000 (17:59 -0300)
Closes #15319

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <ldipentima@veritasgenetics.com>

doc/_config.yml
doc/admin/troubleshooting.html.textile.liquid [new file with mode: 0644]
doc/install/install-api-server.html.textile.liquid
services/api/app/controllers/application_controller.rb
services/api/config/initializers/lograge.rb
services/api/test/functional/application_controller_test.rb

index 36e8c64b86681ab4a44452ccf9f0e1477faf432d..023658d3ab4e369ff7c22d020cf1bd5f7afc76d7 100644 (file)
@@ -188,6 +188,7 @@ navbar:
       - admin/federation.html.textile.liquid
       - admin/controlling-container-reuse.html.textile.liquid
       - admin/logs-table-management.html.textile.liquid
+      - admin/troubleshooting.html.textile.liquid
   installguide:
     - Overview:
       - install/index.html.textile.liquid
diff --git a/doc/admin/troubleshooting.html.textile.liquid b/doc/admin/troubleshooting.html.textile.liquid
new file mode 100644 (file)
index 0000000..66c75f3
--- /dev/null
@@ -0,0 +1,74 @@
+---
+layout: default
+navsection: admin
+title: Troubleshooting
+...
+
+{% comment %}
+Copyright (C) The Arvados Authors. All rights reserved.
+
+SPDX-License-Identifier: CC-BY-SA-3.0
+{% endcomment %}
+
+Using a distributed system with several services working together sometimes makes it difficult to find the root cause of errors, as one single client request usually means several different requests to more than one service.
+
+To deal with this difficulty, Arvados creates a request ID that gets carried over different services as the requests take place. This ID has a specific format and it's comprised of the prefix "@req-@" followed by 20 random alphanumeric characters:
+
+<pre>req-frdyrcgdh4rau1ajiq5q</pre>
+
+This ID gets propagated via an HTTP @X-Request-Id@ header, and gets logged on every service.
+
+h3. API Server error reporting and logging
+
+In addition to providing the request ID on every HTTP response, the API Server adds it to every error message so that all clients show enough information to the user to be able to track a particular issue. As an example, let's suppose that we get the following error when trying to create a collection using the CLI tools:
+
+<pre>
+$ arv collection create --collection '{}'
+Error: #<RuntimeError: Whoops, something bad happened> (req-ku5ct9ehw0y71f1c5p79)
+</pre>
+
+The API Server logs every request in JSON format on the @production.log@ (usually under @/var/www/arvados-api/current/log/@ when installing from packages) file, so we can retrieve more information about this by using @grep@ and @jq@ tools:
+
+<pre>
+# grep req-ku5ct9ehw0y71f1c5p79 /var/www/arvados-api/current/log/production.log | jq .
+{
+  "method": "POST",
+  "path": "/arvados/v1/collections",
+  "format": "json",
+  "controller": "Arvados::V1::CollectionsController",
+  "action": "create",
+  "status": 422,
+  "duration": 1.52,
+  "view": 0.25,
+  "db": 0,
+  "request_id": "req-ku5ct9ehw0y71f1c5p79",
+  "client_ipaddr": "127.0.0.1",
+  "client_auth": "zzzzz-gj3su-jllemyj9v3s5emu",
+  "exception": "#<RuntimeError: Whoops, something bad happened>",
+  "exception_backtrace": "/var/www/arvados-api/current/app/controllers/arvados/v1/collections_controller.rb:43:in `create'\n/var/lib/gems/ruby/2.3.0/gems/actionpack-5.0.7.2/lib/action_controller/metal/basic_implicit_render.rb:4:in `send_action'\n ...[snipped]",
+  "params": {
+    "collection": "{}",
+    "_profile": "true",
+    "cluster_id": "",
+    "collection_given": "true",
+    "ensure_unique_name": "false",
+    "help": "false"
+  },
+  "@timestamp": "2019-07-15T16:40:41.726634182Z",
+  "@version": "1",
+  "message": "[422] POST /arvados/v1/collections (Arvados::V1::CollectionsController#create)"
+}
+</pre>
+
+When logging a request that produced an error, the API Server adds @exception@ and @exception_backtrace@ keys to the JSON log. The latter includes the complete error stack trace as a string, and can be displayed in a more readable form like so:
+
+<pre>
+# grep req-ku5ct9ehw0y71f1c5p79 /var/www/arvados-api/current/log/production.log | jq -r .exception_backtrace
+/var/www/arvados-api/current/app/controllers/arvados/v1/collections_controller.rb:43:in `create'
+/var/lib/gems/ruby/2.3.0/gems/actionpack-5.0.7.2/lib/action_controller/metal/basic_implicit_render.rb:4:in `send_action'
+/var/lib/gems/ruby/2.3.0/gems/actionpack-5.0.7.2/lib/abstract_controller/base.rb:188:in `process_action'
+/var/lib/gems/ruby/2.3.0/gems/actionpack-5.0.7.2/lib/action_controller/metal/rendering.rb:30:in `process_action'
+/var/lib/gems/ruby/2.3.0/gems/actionpack-5.0.7.2/lib/abstract_controller/callbacks.rb:20:in `block in process_action'
+/var/lib/gems/ruby/2.3.0/gems/activesupport-5.0.7.2/lib/active_support/callbacks.rb:126:in `call'
+...
+</pre>
\ No newline at end of file
index 424fc3cf242536bc95dbe34697f27c8166eb1b58..c234bca9270a25268f4acab6d355fe36770ffbc0 100644 (file)
@@ -271,3 +271,7 @@ break this application for all non-root users on this machine.</pre></notextile>
 
 <notextile><pre>fatal: Not a git repository (or any of the parent directories): .git</pre></notextile>
 {% include 'notebox_end' %}
+
+h2. Troubleshooting
+
+Once you have the API Server up and running you may need to check it back if dealing with client related issues. Please read our "admin troubleshooting notes":{{site.baseurl}}/admin/troubleshooting.html on how requests can be tracked down between services.
\ No newline at end of file
index d5bc3f35d7e04fc47f9795f8a3643f4aef8bcf57..b23515dda4528b1e7f44433a8fcfa006d7ee3486 100644 (file)
@@ -140,8 +140,13 @@ class ApplicationController < ActionController::Base
 
   def render_error(e)
     logger.error e.inspect
-    if !e.is_a? RequestError and (e.respond_to? :backtrace and e.backtrace)
-      logger.error e.backtrace.collect { |x| x + "\n" }.join('')
+    if e.respond_to? :backtrace and e.backtrace
+      # This will be cleared by lograge after adding it to the log.
+      # Usually lograge would get the exceptions, but in our case we're catching
+      # all of them with exception handlers that cannot re-raise them because they
+      # don't get propagated.
+      Thread.current[:exception] = e.inspect
+      Thread.current[:backtrace] = e.backtrace.collect { |x| x + "\n" }.join('')
     end
     if (@object.respond_to? :errors and
         @object.errors.andand.full_messages.andand.any?)
index 07dba3aef4ff9bb64db8c1236a73c0c97c15a794..9b422462b193a400a58375bc31cf76486132d023 100644 (file)
@@ -13,6 +13,21 @@ Server::Application.configure do
       client_ipaddr: event.payload[:client_ipaddr],
       client_auth: event.payload[:client_auth],
     }
+
+    # Lograge adds exceptions not being rescued to event.payload, but we're
+    # catching all errors on ApplicationController so we look for backtraces
+    # elsewhere.
+    if !Thread.current[:backtrace].nil?
+      payload.merge!(
+        {
+          exception: Thread.current[:exception],
+          exception_backtrace: Thread.current[:backtrace],
+        }
+      )
+      Thread.current[:exception] = nil
+      Thread.current[:backtrace] = nil
+    end
+
     exceptions = %w(controller action format id)
     params = event.payload[:params].except(*exceptions)
 
index b74ff0f41d6d5190f217df7cc776490d0e8e9d83..175a8f71ea0544e2253754f607a2217a441d63cc 100644 (file)
@@ -121,4 +121,16 @@ class ApplicationControllerTest < ActionController::TestCase
       end
     end
   end
+
+  test "exceptions with backtraces get logged at exception_backtrace key" do
+    Group.stubs(:new).raises(Exception, 'Whoops')
+    Rails.logger.expects(:info).with(any_parameters) do |param|
+      param.include?('Whoops') and param.include?('"exception_backtrace":')
+    end
+    @controller = Arvados::V1::GroupsController.new
+    authorize_with :active
+    post :create, params: {
+      group: {},
+    }
+  end
 end