Tom Clegg [Sun, 29 Mar 2015 00:26:00 +0000 (20:26 -0400)]
5414: Add client support for Keep service hints.
Also, some incidental improvements in nearby code:
* Consistent logging in keepproxy, with one reusable logging statement
instead of a different statement/format for each outcome.
* In sdk/go/keepclient, remove public AuthorizedGet and AuthorizedAsk
methods. Instead, Get() and Ask() accept a locator (with or without
a permission token) and do the right thing. Callers don't have to
parse locators to decide which method to call.
* In sdk/go/keepclient, use an RWMutex instead of atomic.LoadPointer()
and unsafe.Pointer() to update KeepClient root maps safely.
* In sdk/go/keepclient, DiscoverKeepServers() doesn't return the new
root maps, just an error. In normal usage, the caller only cares
whether discovery was successful.
Also, some Go style fixes in nearby code:
* Use pointer receivers for all KeepClient methods.
https://golang.org/doc/faq#methods_on_values_or_pointers
* Use receiver name "kc", not "this".
https://github.com/golang/go/wiki/CodeReviewComments#receiver-names
* Handle errors first, use minimal indentation for normal code path.
https://github.com/golang/go/wiki/CodeReviewComments#indent-error-flow
Brett Smith [Wed, 8 Apr 2015 13:46:23 +0000 (09:46 -0400)]
5642: Explicitly make all swap available under Docker in crunch-job.
Without this, Docker 1.2 through 1.5 send subprocesses SIGKILL if they
exceed the memory limit. Refer to #5642 for an example.
--memory-swap is pretty new (newer than 1.3.3), so we don't want to
require it. At the same time, we don't want to impose any memory
limits if we can't use it, because killing subprocesses that exceed a
--memory limit is too strict. This commit arranges to use both
--memory and --memory-swap only if the latter is available.
Brett Smith [Mon, 6 Apr 2015 18:27:21 +0000 (14:27 -0400)]
5653: arv-copy copies multiple commits from the same repository+pipeline.
arv-copy previously used the repository name alone to determine which
job scripts it had already copied to the destination. If a pipeline
used unrelated commits from the same repository, it would skip copying
over all but the first. Track script versions throughout the copy
process and make sure all of them are copied to the destination
repository.
Brett Smith [Sun, 5 Apr 2015 21:10:22 +0000 (17:10 -0400)]
5352: crunch-dispatch treats node allocation failure as temporary.
Imagine a scenario where multiple crunch-dispatch processes are
sitting idle, then suddenly a new job appears in the queue. They will
all race to dispatch the job. When this happens, we frequently see
that salloc fails for most of them, because they all requested the
same node(s) and only the winner will get them. crunch-dispatch has
no way to know the exit code "came from" salloc and not crunch-job,
and so marks the job failed.
This patch sets the SLURM_EXIT_IMMEDIATE environment variable to make
salloc use exit code 75 when the allocation fails. crunch-dispatch
already recognizes this exit code as a temporary failure, and will
leave the Arvados job record unchanged. Refer to salloc(1) and the
long comment in Dispatch#reap_children.
Brett Smith [Mon, 6 Apr 2015 17:49:29 +0000 (13:49 -0400)]
Tighten up DNS check in arvdock.
I got arvdock stuck in an infinite loop because I had the right
nameserver line in /etc/resolv.conf, except it was commented out, so
arvdock kept checking for a name that it would never find. Use awk to
find and check the first functional nameserver line instead.
Brett Smith [Sun, 5 Apr 2015 21:39:33 +0000 (17:39 -0400)]
Fix debug log formatting in PySDK.
When multiple arguments are passed to a logger method, the first
argument is expected to be a printf-style format string, with the
remaining arguments expected to fill in the formatters.
Brett Smith [Thu, 2 Apr 2015 21:37:53 +0000 (17:37 -0400)]
4253: Gitolite migration makes a name symlink for arvados repository.
This is necessary for other supporting infrastructure around this
repository in Gitolite installs; e.g., the cron job that keeps them in
sync with upstream. Refs #4253.
Peter Amstutz [Thu, 2 Apr 2015 14:24:56 +0000 (10:24 -0400)]
4752: Websockets works. Fix compute node containers to restart correctly. Fix
git server hostname. Arvdock waits for API/workbench to be ready before telling
the user to go to workbench.
Brett Smith [Wed, 1 Apr 2015 19:50:08 +0000 (15:50 -0400)]
5627: Python file-like objects use SEET_SET as the default whence.
This is a brown paper bag commit. All that time I spent grumbling
that we had the wrong default was completely incorrect. We had it
right earlier, and I blew it. See
<https://docs.python.org/2/library/stdtypes.html#file.seek>.
Brett Smith [Mon, 30 Mar 2015 18:28:40 +0000 (14:28 -0400)]
4253: Let Workbench Manage Account partials render their whole pane.
I'm about to add another "Add" button to the Repositories pane, so I'd
rather do it this way than try to maintain a generic loop inside
_manage_account.html.erb.
Brett Smith [Mon, 30 Mar 2015 15:06:51 +0000 (11:06 -0400)]
4253: Sync up Docker with our production Gitolite setup.
This updates our API server Docker image to store repositories by
UUID, with name aliases available. .gitolite.rc enables the aliasing,
and update-gitolite.rb generates the necessary configuration. This
makes it possible to test the recent repository changes in Docker.
Brett Smith [Tue, 31 Mar 2015 13:23:56 +0000 (09:23 -0400)]
4253: Users can manage their own repositories.
This commit allows users to create their own repositories, as long as
the repository name starts with their own username.
To support this change, we've modified our Gitolite setup to store
repositories primarily by UUID, with a name alias for easier
checkout. fetch_url and push_url become generated attributes
accordingly. This makes it easier to rename the repository later and
allow checkouts to continue to work.
Peter Amstutz [Tue, 31 Mar 2015 21:13:06 +0000 (17:13 -0400)]
5612: Wrap munge with startup script that cleans up /var/run/munge of stale
sockets and pidfiles. Can now run jobs after stopping and restarting
containers.
Brett Smith [Thu, 26 Mar 2015 16:29:38 +0000 (12:29 -0400)]
5502: Adjust id and name of Node Manager cloud object mocks.
Some GCE objects like disk types have predictable names, so it's
helpful to be able to mock objects with the same name. Use the
name_id argument is the literal name, and generate an id from it.
Brett Smith [Wed, 18 Mar 2015 20:00:01 +0000 (16:00 -0400)]
4253: Use new username to set up repository and VM logins.
The usernames added in 4253 have stricter limits than past usernames
generated to set up a repository and VM login. Use the new generated
username to avoid a weird disconnect between that and these related
objects. Do a little cleanup in the tests, including removing some
test parameters that now seem redundant under the new rules.
Brett Smith [Tue, 17 Mar 2015 22:10:14 +0000 (18:10 -0400)]
4253: Add a username attribute to users.
* Add the column, and propagate it based on available VM logins or
e-mail address, if possible.
* Add format validation and tests.
* Set new usernames based on e-mail address, with tests.
Radhika Chippada [Wed, 25 Mar 2015 14:16:27 +0000 (10:16 -0400)]
5556: Added select() to @logs in CollectionsController#show to avoid trasporting the unneeded log properties.
Also, added Accept-Encoding request header in workbench request to api server.
Radhika Chippada [Tue, 24 Mar 2015 13:49:38 +0000 (09:49 -0400)]
5534: When a pipeline has long running jobs with hundreds of thousands of log lines, the log line display is timing out
fetching all those lines. Limiting the number of log lines retrieved helped resolve this issue. Using limit size of 2000.
I was able to see log when I limited to 10000 log lines as well, but it took much longer wait time and I felt a quicker
response might provide a better user experience and hence using 2000 log lines as the limit. Thus the most recent 2000
log lines are fetched, followed by newer log lines from event log if the job is still running.
Peter Amstutz [Mon, 23 Mar 2015 21:12:55 +0000 (17:12 -0400)]
5539: Arvados-in-Docker improvements
* arvdock start now restarts existing containers instead of deleting them. Use arvdock reset to delete the containers.
* Uses docker data container for keep.
* Prints a note about adding the nameserver to /etc/resolv.conf