Peter Amstutz [Wed, 31 May 2017 19:37:15 +0000 (15:37 -0400)]
10847: Daemon shutdown now stops most actors, only waits for setup actors.
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curoverse.com>
Peter Amstutz [Wed, 7 Jun 2017 14:57:25 +0000 (10:57 -0400)]
10312: Add example jobs_queue and slurm_queue options to example node manager configurations.
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curoverse.com>
Peter Amstutz [Wed, 7 Jun 2017 14:53:43 +0000 (10:53 -0400)]
10312: Add some comments to node manager integration test.
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curoverse.com>
Peter Amstutz [Tue, 6 Jun 2017 13:31:46 +0000 (09:31 -0400)]
10312: Add services/nodemanager-integration to test list
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curoverse.com>
Peter Amstutz [Mon, 5 Jun 2017 20:40:30 +0000 (16:40 -0400)]
10312: Fix unit tests.
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curoverse.com>
Peter Amstutz [Fri, 2 Jun 2017 21:35:15 +0000 (17:35 -0400)]
10312: Tests pass for booting single node, multiple nodes, hitting quota, quota
probe. Add node manager integration to run-tests.sh.
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curoverse.com>
Peter Amstutz [Fri, 2 Jun 2017 15:58:55 +0000 (11:58 -0400)]
10312: Integration test framework for node manager, runs full node manager with
fake cloud driver and monitors logging output.
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curoverse.com>
Peter Amstutz [Thu, 1 Jun 2017 21:37:09 +0000 (17:37 -0400)]
10312: Adding ability to substitute fake libcloud driver but run full node manager for integration testing.
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curoverse.com>
Peter Amstutz [Thu, 1 Jun 2017 14:07:49 +0000 (10:07 -0400)]
10312: Identify error message that look like we are hitting a quota or account limit. Set soft node quota in order to stop trying to boot new nodes until the total node count goes down. Probe node quota upward when at the soft limit and able to boot nodes successfully.
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curoverse.com>
Peter Amstutz [Wed, 31 May 2017 18:26:01 +0000 (14:26 -0400)]
Merge branch '11766-workflow-deadlock' closes #11766
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curoverse.com>
Lucas Di Pentima [Tue, 30 May 2017 21:24:23 +0000 (18:24 -0300)]
Merge branch '11684-unsigned-locator-fix'
Closes #11684
Lucas Di Pentima [Tue, 30 May 2017 20:56:07 +0000 (17:56 -0300)]
11684: ArvadosFile.flush() now check if it is the only owner of a bufferblock before deleting it, so the extra argument is not required.
On commit_all(), always check if the owner attribute is an instance of ArvadosFile before calling flush()
Fixed a couple tests that were mocking bufferblock.owner so that they work with this new behavior.
Peter Amstutz [Tue, 30 May 2017 18:42:34 +0000 (14:42 -0400)]
Merge branch '11767-squeue-reasons' refs #11767
Peter Amstutz [Tue, 30 May 2017 18:34:39 +0000 (14:34 -0400)]
11767: Make squeue format output pipe (|) delimited so that it doesn't get
confused by spaces in the "Reasons" column
Lucas Di Pentima [Tue, 30 May 2017 17:07:41 +0000 (14:07 -0300)]
11684: Merge branch 'master' into 11684-unsigned-locator-fix
Peter Amstutz [Tue, 30 May 2017 15:18:41 +0000 (11:18 -0400)]
Merge branch '11769-scancel-jobs-only' closes #11769
radhika [Mon, 29 May 2017 19:56:26 +0000 (15:56 -0400)]
closes #11739 , #11751
Merge branch '11739-container-requests-in-dashboard'
radhika [Mon, 29 May 2017 17:19:05 +0000 (13:19 -0400)]
11739: preload containers and children of all container_requests in dashboard display.
11751: in /container_requests page, use the column name "Name" and display either name of uuid of the object.
Lucas Di Pentima [Mon, 29 May 2017 16:59:14 +0000 (13:59 -0300)]
11684: Instead of fiddling with ArvadosFile object's internals from the BlockManager
put threads to update the segments locators when committing synchronously a block
built from smaller blocks, take advantage of ArvadosFile.flush() existing mechanism
to update unrealized segments locators by building a list of bufferblock owners
and calling all owner's flush() method on commit_all().
To avoid calling delete_bufferblock() many times on a single bufferblock, added
a flag on flush() and delete the bufferblock after flushing all owners.
Peter Amstutz [Fri, 26 May 2017 17:29:45 +0000 (13:29 -0400)]
11766: Bump cwltool version for deadlock fix.
Peter Amstutz [Fri, 26 May 2017 20:22:10 +0000 (16:22 -0400)]
Fix crunch script to set trash_intermediate and intermediate_output_ttl refs #11100
Peter Amstutz [Fri, 26 May 2017 19:49:32 +0000 (15:49 -0400)]
11769: HasUuid::UUID_REGEX matches anything that looks like an Arvados uuid. As a result, if crunchv1 and crunchv2 dispatchers are on the same cluster, crunch-dispatch.rb will try to scancel containers thinking they are "orphan jobs". Tighten the regex to only match job uuids.
Peter Amstutz [Fri, 26 May 2017 19:32:46 +0000 (15:32 -0400)]
Merge branch '11767-slurm-units' refs #11767
Peter Amstutz [Fri, 26 May 2017 19:31:57 +0000 (15:31 -0400)]
11767: Test cases that it will convert fractional values from squeue.
Peter Amstutz [Fri, 26 May 2017 19:25:56 +0000 (15:25 -0400)]
11767: Slurm apparently will print out half values (like 2.5G).
Peter Amstutz [Fri, 26 May 2017 19:14:12 +0000 (15:14 -0400)]
11767: Recognize lowercase suffixes, just in case future versions of slurm
change the format again.
Peter Amstutz [Fri, 26 May 2017 18:48:16 +0000 (14:48 -0400)]
11767: Correctly parse values with unit suffixes printed by squeue.
Peter Amstutz [Fri, 26 May 2017 18:21:34 +0000 (14:21 -0400)]
Merge branch '11100-cwl-set-output-ttl' closes #11100
Peter Amstutz [Fri, 26 May 2017 18:21:01 +0000 (14:21 -0400)]
Merge branch 'master' into 11100-cwl-set-output-ttl
Peter Amstutz [Fri, 26 May 2017 16:13:25 +0000 (12:13 -0400)]
11100: Fix test
Peter Amstutz [Fri, 26 May 2017 15:06:35 +0000 (11:06 -0400)]
11100: Add ciso8601 dependency
Peter Amstutz [Fri, 26 May 2017 14:42:47 +0000 (10:42 -0400)]
11100: Add test for --trash-intermediate. Add log message when intermediate
outputs are scheduled for trash.
Lucas Di Pentima [Fri, 26 May 2017 02:34:59 +0000 (23:34 -0300)]
11684: When packing small blocks into one, save references of the files
included on the block when committing it asynchronously, so that the
segment's locators can be updated at the put thread after the block is
committed and the permission token is returned from the API Server.
Jiayong Li [Fri, 26 May 2017 01:49:26 +0000 (21:49 -0400)]
closes #11362 Merge branch '11362-missing-input-sdk/cwl'
Jiayong Li [Fri, 26 May 2017 01:47:59 +0000 (21:47 -0400)]
Change visit in pathmapper.py to raise OSError if input file is not found, change test_pathmapper.py to test it, and update python client version in setup.py
Jiayong Li [Fri, 26 May 2017 01:38:12 +0000 (21:38 -0400)]
refs #11362 Merge branch '11362-missing-input-sdk/python'
Jiayong Li [Fri, 26 May 2017 01:36:51 +0000 (21:36 -0400)]
Change statfile in run.py to be able to raise OSError, and change the mock in test_pathmapper.py accordingly
Lucas Di Pentima [Wed, 24 May 2017 18:10:53 +0000 (15:10 -0300)]
Merge branch '11501-job-stats-discrepancy'
Closes #11501
Lucas Di Pentima [Wed, 24 May 2017 16:51:47 +0000 (13:51 -0300)]
11501: Simplified helper method call. Changed wording of run time description.
Peter Amstutz [Wed, 24 May 2017 14:56:49 +0000 (10:56 -0400)]
Merge branch '11543-collection-per-tool' closes #11543
Peter Amstutz [Wed, 24 May 2017 14:36:40 +0000 (10:36 -0400)]
11100: Separate "trash intermediate on success" behavior from "output intermediate TTL" option. Update documentation.
Lucas Di Pentima [Wed, 24 May 2017 13:47:56 +0000 (10:47 -0300)]
11501: Don't filter out reused children when calculating running time.
Use always wall time when saying how much time passed after completion/failure.
Lucas Di Pentima [Tue, 23 May 2017 23:11:14 +0000 (20:11 -0300)]
11501: When calculating a work unit's running time, only include 'leaf' children, filtering those that were reused.
Peter Amstutz [Tue, 23 May 2017 19:53:10 +0000 (15:53 -0400)]
11543: Bump version dependency on arvados-python-client.
Peter Amstutz [Wed, 17 May 2017 13:03:18 +0000 (09:03 -0400)]
11543: Upload tool dependencies into single collection. Add test for collection per tool. Fix other tests.
Peter Amstutz [Tue, 23 May 2017 19:52:05 +0000 (15:52 -0400)]
Merge branch '11543-uploadfile-collection' refs #11543
Peter Amstutz [Wed, 17 May 2017 13:03:18 +0000 (09:03 -0400)]
11543: arvados.command.run.uploadfiles takes optional Collection to upload to.
Lucas Di Pentima [Tue, 23 May 2017 14:26:33 +0000 (11:26 -0300)]
11501: Merge branch 'master' into 11501-job-stats-discrepancy
Peter Amstutz [Mon, 22 May 2017 20:08:49 +0000 (16:08 -0400)]
11100: Update/add tests for --intermediate-output-ttl
Lucas Di Pentima [Mon, 22 May 2017 20:05:45 +0000 (17:05 -0300)]
11684: Reverted easy fix to expose the bug: when there's a delay writing a block that's
produced by packing smaller blocks into one, its locator doesn't get updated with the
correct access token, so it will fail when trying to save the collection to the API
server.
Peter Amstutz [Mon, 22 May 2017 18:32:08 +0000 (14:32 -0400)]
11100: Implement & document arv:IntermediateOutput hint.
Peter Amstutz [Mon, 22 May 2017 17:06:39 +0000 (13:06 -0400)]
11100: Propagate through to runner. Use intermediate_output_ttl consistently.
Peter Amstutz [Fri, 31 Mar 2017 21:49:38 +0000 (17:49 -0400)]
11100: a-c-r sets output_ttl and deletes intermediate collections on success.
Lucas Di Pentima [Mon, 22 May 2017 15:41:41 +0000 (12:41 -0300)]
11501: Fix some tests when trying to use the walltime method on show_runtime.
Lucas Di Pentima [Mon, 22 May 2017 15:40:57 +0000 (12:40 -0300)]
11501: Improved test name
Peter Amstutz [Mon, 22 May 2017 15:10:50 +0000 (11:10 -0400)]
Merge branch '11369-crunchv2-notes' refs #11369
Peter Amstutz [Mon, 22 May 2017 14:57:35 +0000 (10:57 -0400)]
11369: Add migration notes about crunchv1-to-crunchv2
Tom Clegg [Sun, 21 May 2017 18:33:08 +0000 (14:33 -0400)]
Merge branch '11720-govendor'
closes #11720
Tom Clegg [Fri, 19 May 2017 23:36:56 +0000 (19:36 -0400)]
Merge branch '11590-log-reuse'
closes #11590
Tom Clegg [Fri, 19 May 2017 23:17:22 +0000 (19:17 -0400)]
Merge branch '9005-disable-keepalive'
refs #9005
refs #11726
refs #11729
Tom Clegg [Fri, 19 May 2017 23:11:05 +0000 (19:11 -0400)]
9005: 11726: 11729: Disable http keepalive.
The previous workaround for #9005 did not account for the prefetch
feature: if a goroutine is using the HTTP client to prefetch data at
the moment the handler exits, CloseIdleConnections() does not close
that connection, so it stays open indefinitely.
Lucas Di Pentima [Fri, 19 May 2017 22:07:08 +0000 (19:07 -0300)]
11501: A work unit running time with children was only computed calculating its direct
childs, so if a work unit child has more children, its running time could differ.
Tom Clegg [Fri, 19 May 2017 20:23:39 +0000 (16:23 -0400)]
Merge branch '9005-conn-leak'
refs #9005
Tom Clegg [Fri, 19 May 2017 19:55:40 +0000 (15:55 -0400)]
9005: Fix missing error checks.
Tom Clegg [Fri, 19 May 2017 19:55:22 +0000 (15:55 -0400)]
9005: Fix missing Close().
radhika [Fri, 19 May 2017 19:34:32 +0000 (15:34 -0400)]
closes #11710
Merge branch '11710-container-request-show-perf'
radhika [Fri, 19 May 2017 18:30:58 +0000 (14:30 -0400)]
11710: fix typo in finding children
radhika [Thu, 18 May 2017 19:38:52 +0000 (15:38 -0400)]
11710: fetching requesting containers
radhika [Thu, 18 May 2017 15:42:25 +0000 (11:42 -0400)]
11710: preload / batch retrieval of children of a container_work_unit
Tom Clegg [Fri, 19 May 2017 17:47:52 +0000 (13:47 -0400)]
9005: Remove debug printf.
Tom Clegg [Thu, 18 May 2017 21:25:20 +0000 (17:25 -0400)]
11590: Dry up log_reuse_info() calls.
radhika [Thu, 18 May 2017 19:46:15 +0000 (15:46 -0400)]
refs #9587
Merge branch '9587-include-trash-in-group-contents'
radhika [Thu, 18 May 2017 01:47:52 +0000 (21:47 -0400)]
9587: add support for include_trash in groups_controller -> contents method
Tom Clegg [Thu, 18 May 2017 18:53:38 +0000 (14:53 -0400)]
11590: Add container logging tests.
Tom Clegg [Thu, 18 May 2017 18:45:00 +0000 (14:45 -0400)]
Merge branch '11644-mounts-api'
closes #11644
Tom Clegg [Thu, 18 May 2017 17:26:36 +0000 (13:26 -0400)]
11644: Add DeviceID() to Volume interface.
Tom Clegg [Thu, 18 May 2017 17:22:54 +0000 (13:22 -0400)]
11644: Unify block-index handlers. Move prefix arg to query string.
Tom Clegg [Thu, 18 May 2017 16:19:13 +0000 (12:19 -0400)]
11590: Log container reuse decisions.
Tom Clegg [Thu, 18 May 2017 14:26:53 +0000 (10:26 -0400)]
Merge branch '11590-log-reuse'
refs #11590
Tom Clegg [Thu, 18 May 2017 14:26:35 +0000 (10:26 -0400)]
11590: Clarify "job state" condition in log message.
Tom Clegg [Thu, 18 May 2017 14:21:56 +0000 (10:21 -0400)]
11720: Add vendor/.gitignore.
Tom Clegg [Wed, 17 May 2017 21:07:41 +0000 (17:07 -0400)]
11720: Update Go dependencies.
Tom Clegg [Wed, 17 May 2017 20:48:09 +0000 (16:48 -0400)]
11720: Merge branch 'master' into 11720-govendor
Tom Clegg [Wed, 17 May 2017 20:47:20 +0000 (16:47 -0400)]
Merge branch '11546-fast-lock'
closes #11546
Tom Clegg [Wed, 17 May 2017 19:26:26 +0000 (15:26 -0400)]
11720: Control dependencies with govendor.
radhika [Wed, 17 May 2017 18:28:43 +0000 (14:28 -0400)]
closes #11580
Merge branch '11580-container-requests'
Tom Clegg [Wed, 17 May 2017 18:28:33 +0000 (14:28 -0400)]
11546: Wrap lock/unlock in transactions.
radhika [Tue, 16 May 2017 19:33:31 +0000 (15:33 -0400)]
11580: preload containers
radhika [Tue, 16 May 2017 00:09:25 +0000 (20:09 -0400)]
11580: add container_requests index page
Tom Clegg [Wed, 17 May 2017 17:51:57 +0000 (13:51 -0400)]
11546: Avoid loading/saving non-essential fields in /arvados/v1/containers/lock.
Peter Amstutz [Wed, 17 May 2017 17:50:03 +0000 (13:50 -0400)]
Merge branch '11714-crunch-run-cgroup-parent' closes #11714
Peter Amstutz [Wed, 17 May 2017 15:07:46 +0000 (11:07 -0400)]
11714: Set CgroupParent under Resources because setting Cgroup in HostConfig
doesn't work.
Peter Amstutz [Wed, 17 May 2017 17:08:35 +0000 (13:08 -0400)]
Merge branch '11718-crunch-run-docker-wait' closes #11718
Peter Amstutz [Wed, 17 May 2017 15:04:06 +0000 (11:04 -0400)]
11718: Update crunch-run for docker client API change in ContainerWait().
Tom Clegg [Tue, 16 May 2017 17:31:27 +0000 (13:31 -0400)]
11644: Test fields in /mounts response.
Tom Clegg [Tue, 16 May 2017 17:23:02 +0000 (13:23 -0400)]
11644: Ensure generated UUIDs are always 27 chars.
Tom Clegg [Tue, 16 May 2017 17:22:42 +0000 (13:22 -0400)]
11644: Add volume replication level to /mounts response.
Tom Clegg [Tue, 16 May 2017 17:11:45 +0000 (13:11 -0400)]
11644: Replace linear search with map for looking up mounts by UUID.
Tom Clegg [Tue, 16 May 2017 17:03:14 +0000 (13:03 -0400)]
11644: Test non-empty MountUUID in trash list.
Tom Clegg [Tue, 16 May 2017 16:43:39 +0000 (12:43 -0400)]
11644: Add pull-to-mount-UUID test. Tidy up pull worker and tests.