Peter Amstutz [Fri, 5 Feb 2016 16:10:43 +0000 (11:10 -0500)]
7667: Combine polling logs into fewer lines for less noise. Adjust message
when last_ping_at is unexpectedly none to be less severe (can happen in
innocent circumstances). Report nodes in "booted" list as "booting" since they
are unpaired. Fix tests.
Changing from `/etc/ssl/certs` to `/etc/ssl/certs/ca-certificates.crt`
is safe, because add_trust_ca accepts either a directory with hashed
certs, or a file with multiple certs. On Debian, the latter path is a
single file built from the hashed certs in the former, so this is
functionally identical there, and more predictable on Red Hat (where I
don't know what it's doing).
Peter Amstutz [Tue, 2 Feb 2016 17:26:57 +0000 (12:26 -0500)]
6702: Refactor create_node to BaseComputeNodeDriver so logic also applies to
Azure. Adds new find_node() method; if returns None or raises an error,
re-raise the original create_node exception.
Brett Smith [Mon, 1 Feb 2016 17:43:04 +0000 (12:43 -0500)]
8014: Remove more upgrade script references from install guide.
The steps removed are now handled by Rails package postinst scripts.
This should've been done in 378a988bbf9e29736382339f587582259b641782,
but was overlooked. Refs #8014.
Brett Smith [Mon, 1 Feb 2016 16:51:14 +0000 (11:51 -0500)]
Fix install doc rendering of API Nginx config.
<notextile> doesn't actually nest like proper HTML, it's just a
boolean that remembers the last state. Turn it back on after doing an
include that turns it off. No issue #.
Brett Smith [Thu, 28 Jan 2016 00:02:05 +0000 (19:02 -0500)]
8005: Install guide uses runit packages on Red Hat.
The runit RPMs only provide /etc/service. The .debs provide /etc/sv
and /etc/service. Our understanding is that /etc/sv is for all
service definitions (akin to /etc/init.d), and /etc/service is for
service definitions that runit should start at boot (akin to
/etc/rcN.d). To provide uniformity, our install guide instructs users
to make /etc/sv if needed, and link it to /etc/service.
This commit could go farther. Today it would be best if all the runit
sections in the install guide followed Tom's modern template used for
arv-git-httpd and arvados-docker-cleaner. However, that will probably
require some creation and testing of log/run scripts, and some
adaptation of the run scripts to fit the template. I wish I could
include those improvements now, but unfortunately I'm out of time, so
they'll have to wait for another day.
radhika [Thu, 21 Jan 2016 20:25:06 +0000 (15:25 -0500)]
8178: All three currently supported volumes return error when trash-lifetime period is not configured. azure blob and s3 volumes are updated to do so.
Returning an error is causing test failures in unix volume and hence is still a work in progress.
radhika [Thu, 21 Jan 2016 20:25:06 +0000 (15:25 -0500)]
8178: All three currently supported volumes return error when trash-lifetime period is not configured. azure blob and s3 volumes are updated to do so.
Returning an error is causing test failures in unix volume and hence is still a work in progress.
Tom Clegg [Thu, 21 Jan 2016 19:35:34 +0000 (14:35 -0500)]
8281: Fix KeepClient retry bugs.
get() and put() were both handling all Curl exceptions -- including
timeouts -- by marking the keep service as unusable. For example, if a
single proxy is the only service available, a single timeout was
fatal. This is fixed by setting the retry loop status to None instead
of False after curl exceptions.
put() was repeating its retry loop until it achieved the desired
number of replicas _in a single iteration_. For example, when trying
to store 2 replicas, 6 loop iterations with a single success in each
iteration would result in 6 copies being stored but put() declaring
failure. This is fixed by checking against a cumulative "done" counter
instead of the "copies done in this loop iteration" counter.
Tom Clegg [Thu, 21 Jan 2016 09:01:16 +0000 (04:01 -0500)]
8281: Fix arv-mount ignoring --retries argument when writing file data.
"num_retries" arguments get passed around extensively in arvfile.py
and collection.py in the Python SDK, but ultimately the writing of
file data is done by a _BlockManager which doesn't have any way to
accept that argument or pass it along to a KeepClient, so PUT requests
always use the CollectionWriter's KeepClient's default num_retries.
In arv-mount's case, we have been telling CollectionWriter the
num_retries we want. When CollectionWriter creates a KeepClient,
num_retries gets passed along -- normally this works around the fact
that num_retries gets lost by the _BlockManager layer. However, we
provided our own KeepClient to use instead of letting CollectionWriter
create one, and we forgot to set num_retries on our own KeepClient, so
we weren't retrying PUT requests.
radhika [Thu, 21 Jan 2016 20:25:06 +0000 (15:25 -0500)]
8178: All three currently supported volumes return error when trash-lifetime period is not configured. azure blob and s3 volumes are updated to do so.
Returning an error is causing test failures in unix volume and hence is still a work in progress.