--- /dev/null
+---
+layout: default
+navsection: admin
+title: Troubleshooting
+...
+
+{% comment %}
+Copyright (C) The Arvados Authors. All rights reserved.
+
+SPDX-License-Identifier: CC-BY-SA-3.0
+{% endcomment %}
+
+Using a distributed system with several services working together sometimes makes it difficult to find the root cause of errors, as one single client request usually means several different requests to more than one service.
+
+To deal with this difficulty, Arvados creates a request ID that gets carried over different services as the requests take place. This ID has a specific format and it's comprised of the prefix "@req-@" followed by 20 random alphanumeric characters:
+
+<pre>req-frdyrcgdh4rau1ajiq5q</pre>
+
+This ID gets propagated via an HTTP @X-Request-Id@ header, and gets logged on every service.
+
+h3. API Server error reporting and logging
+
+In addition to providing the request ID on every HTTP response, the API Server adds it to every error message so that all clients show enough information to the user to be able to track a particular issue. As an example, let's suppose that we get the following error when trying to create a collection using the CLI tools:
+
+<pre>
+$ arv collection create --collection '{}'
+Error: #<RuntimeError: Whoops, something bad happened> (req-ku5ct9ehw0y71f1c5p79)
+</pre>
+
+The API Server logs every request in JSON format on the @production.log@ (usually under @/var/www/arvados-api/current/log/@ when installing from packages) file, so we can retrieve more information about this by using @grep@ and @jq@ tools:
+
+<pre>
+# grep req-ku5ct9ehw0y71f1c5p79 /var/www/arvados-api/current/log/production.log | jq .
+{
+ "method": "POST",
+ "path": "/arvados/v1/collections",
+ "format": "json",
+ "controller": "Arvados::V1::CollectionsController",
+ "action": "create",
+ "status": 422,
+ "duration": 1.52,
+ "view": 0.25,
+ "db": 0,
+ "request_id": "req-ku5ct9ehw0y71f1c5p79",
+ "client_ipaddr": "127.0.0.1",
+ "client_auth": "zzzzz-gj3su-jllemyj9v3s5emu",
+ "exception": "#<RuntimeError: Whoops, something bad happened>",
+ "exception_backtrace": "/var/www/arvados-api/current/app/controllers/arvados/v1/collections_controller.rb:43:in `create'\n/var/lib/gems/ruby/2.3.0/gems/actionpack-5.0.7.2/lib/action_controller/metal/basic_implicit_render.rb:4:in `send_action'\n ...[snipped]",
+ "params": {
+ "collection": "{}",
+ "_profile": "true",
+ "cluster_id": "",
+ "collection_given": "true",
+ "ensure_unique_name": "false",
+ "help": "false"
+ },
+ "@timestamp": "2019-07-15T16:40:41.726634182Z",
+ "@version": "1",
+ "message": "[422] POST /arvados/v1/collections (Arvados::V1::CollectionsController#create)"
+}
+</pre>
+
+When logging a request that produced an error, the API Server adds @exception@ and @exception_backtrace@ keys to the JSON log. The latter includes the complete error stack trace as a string, and can be displayed in a more readable form like so:
+
+<pre>
+# grep req-ku5ct9ehw0y71f1c5p79 /var/www/arvados-api/current/log/production.log | jq -r .exception_backtrace
+/var/www/arvados-api/current/app/controllers/arvados/v1/collections_controller.rb:43:in `create'
+/var/lib/gems/ruby/2.3.0/gems/actionpack-5.0.7.2/lib/action_controller/metal/basic_implicit_render.rb:4:in `send_action'
+/var/lib/gems/ruby/2.3.0/gems/actionpack-5.0.7.2/lib/abstract_controller/base.rb:188:in `process_action'
+/var/lib/gems/ruby/2.3.0/gems/actionpack-5.0.7.2/lib/action_controller/metal/rendering.rb:30:in `process_action'
+/var/lib/gems/ruby/2.3.0/gems/actionpack-5.0.7.2/lib/abstract_controller/callbacks.rb:20:in `block in process_action'
+/var/lib/gems/ruby/2.3.0/gems/activesupport-5.0.7.2/lib/active_support/callbacks.rb:126:in `call'
+...
+</pre>
\ No newline at end of file
def render_error(e)
logger.error e.inspect
- if !e.is_a? RequestError and (e.respond_to? :backtrace and e.backtrace)
- logger.error e.backtrace.collect { |x| x + "\n" }.join('')
+ if e.respond_to? :backtrace and e.backtrace
+ # This will be cleared by lograge after adding it to the log.
+ # Usually lograge would get the exceptions, but in our case we're catching
+ # all of them with exception handlers that cannot re-raise them because they
+ # don't get propagated.
+ Thread.current[:exception] = e.inspect
+ Thread.current[:backtrace] = e.backtrace.collect { |x| x + "\n" }.join('')
end
if (@object.respond_to? :errors and
@object.errors.andand.full_messages.andand.any?)
client_ipaddr: event.payload[:client_ipaddr],
client_auth: event.payload[:client_auth],
}
+
+ # Lograge adds exceptions not being rescued to event.payload, but we're
+ # catching all errors on ApplicationController so we look for backtraces
+ # elsewhere.
+ if !Thread.current[:backtrace].nil?
+ payload.merge!(
+ {
+ exception: Thread.current[:exception],
+ exception_backtrace: Thread.current[:backtrace],
+ }
+ )
+ Thread.current[:exception] = nil
+ Thread.current[:backtrace] = nil
+ end
+
exceptions = %w(controller action format id)
params = event.payload[:params].except(*exceptions)