Tom Clegg [Mon, 18 Jan 2021 18:36:37 +0000 (13:36 -0500)]
Add dumpgob command.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 14 Jan 2021 20:36:10 +0000 (15:36 -0500)]
Fix log message.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 18 Dec 2020 16:00:44 +0000 (11:00 -0500)]
Share a single sitefs for multiple arvados files.
Avoid needlessly re-fetching the manifest when reading multiple files
from one collection.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 18 Dec 2020 15:52:57 +0000 (10:52 -0500)]
Bump import memory.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 17 Dec 2020 15:36:08 +0000 (10:36 -0500)]
Bump exportnumpy memory.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 16 Dec 2020 19:49:55 +0000 (14:49 -0500)]
Sort rows by label shown in csv, not full file path.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 15 Dec 2020 14:31:30 +0000 (09:31 -0500)]
Bump merge memory.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 15 Dec 2020 14:31:13 +0000 (09:31 -0500)]
Load library with pgzip.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 14 Dec 2020 21:00:37 +0000 (16:00 -0500)]
Faster merge output.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 8 Dec 2020 15:35:01 +0000 (10:35 -0500)]
Bigger output buffer for annotate.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Tue, 8 Dec 2020 15:34:07 +0000 (10:34 -0500)]
Save git version in lightning binary collection name and properties.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Fri, 4 Dec 2020 15:47:13 +0000 (10:47 -0500)]
Include numpy matrix filename in labels csv.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Fri, 4 Dec 2020 15:46:42 +0000 (10:46 -0500)]
More memory + direct Keep access for merge and exportnumpy.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Wed, 2 Dec 2020 20:49:20 +0000 (15:49 -0500)]
Update example.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Wed, 2 Dec 2020 20:48:01 +0000 (15:48 -0500)]
Don't pass -gvcf-type="" to gvcf_regions.py.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Wed, 2 Dec 2020 20:47:48 +0000 (15:47 -0500)]
Fix deadlock on error.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Wed, 2 Dec 2020 20:47:44 +0000 (15:47 -0500)]
Improve import and vcf2fasta performance.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Sun, 29 Nov 2020 17:53:04 +0000 (12:53 -0500)]
Increase concurrency, reduce allocs.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Sun, 29 Nov 2020 17:52:29 +0000 (12:52 -0500)]
Reduce lock contention.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Wed, 25 Nov 2020 21:25:14 +0000 (16:25 -0500)]
Use smaller machines for small batches.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Wed, 25 Nov 2020 20:50:16 +0000 (15:50 -0500)]
Use buffered writer to avoid overwhelming arv-mount.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Wed, 25 Nov 2020 14:24:25 +0000 (09:24 -0500)]
Share CR refresh throttle when running multiple containers.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Wed, 25 Nov 2020 06:07:46 +0000 (01:07 -0500)]
Export labels.csv with numpy array.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Tue, 24 Nov 2020 20:10:29 +0000 (15:10 -0500)]
Concurrent-batches mode for vcf2fasta and import.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Mon, 23 Nov 2020 02:43:41 +0000 (21:43 -0500)]
More memory for pca.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Mon, 23 Nov 2020 02:43:15 +0000 (21:43 -0500)]
Propagate -match-chromosome arg to container.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Sun, 22 Nov 2020 17:07:14 +0000 (12:07 -0500)]
Pass gvcf-type to gvcf_regions.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Sun, 22 Nov 2020 08:21:19 +0000 (03:21 -0500)]
Fix fasta sequence names.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Sun, 22 Nov 2020 05:50:40 +0000 (00:50 -0500)]
Accept filter args in pca-go.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Sun, 22 Nov 2020 05:48:57 +0000 (00:48 -0500)]
Let caller set container output name.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 19 Nov 2020 05:38:25 +0000 (00:38 -0500)]
Fix divide by zero.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 19 Nov 2020 01:39:46 +0000 (20:39 -0500)]
Configurable chromosome name pattern.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 19 Nov 2020 01:24:15 +0000 (20:24 -0500)]
Fix wrong container name.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 19 Nov 2020 01:22:43 +0000 (20:22 -0500)]
Indicate low quality tile variants with -1 in numpy array.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Fri, 13 Nov 2020 07:43:37 +0000 (02:43 -0500)]
Gzip gob files.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Fri, 13 Nov 2020 07:19:29 +0000 (02:19 -0500)]
Update example.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 12 Nov 2020 23:56:48 +0000 (18:56 -0500)]
When not saving incomplete tilevars, still save hashes/indexes.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 5 Nov 2020 18:29:49 +0000 (13:29 -0500)]
Add numpy-common-variants sanity check.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 5 Nov 2020 05:30:33 +0000 (00:30 -0500)]
Renumber/prune variants for numpy export.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Mon, 2 Nov 2020 19:02:10 +0000 (14:02 -0500)]
Omit refname field in annotation if only one ref exists.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Mon, 2 Nov 2020 16:29:35 +0000 (11:29 -0500)]
Adjust speed knobs.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Mon, 2 Nov 2020 14:13:52 +0000 (09:13 -0500)]
Don't drop ref tile data when filtering.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Mon, 2 Nov 2020 05:55:55 +0000 (00:55 -0500)]
Propagate filters to container.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Mon, 2 Nov 2020 05:55:42 +0000 (00:55 -0500)]
Faster annotate.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Sun, 1 Nov 2020 07:38:38 +0000 (02:38 -0500)]
Write annotations along with numpy.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Sat, 31 Oct 2020 04:38:40 +0000 (00:38 -0400)]
Use tilelib to load for export and pca.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 29 Oct 2020 14:18:48 +0000 (10:18 -0400)]
Log locations of long tiles.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 29 Oct 2020 13:44:20 +0000 (09:44 -0400)]
max-tile-size option.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 29 Oct 2020 13:40:08 +0000 (09:40 -0400)]
Less memory and cpu for annotate.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 29 Oct 2020 13:31:24 +0000 (09:31 -0400)]
Fix example.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 29 Oct 2020 07:34:07 +0000 (03:34 -0400)]
Faster annotate.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 29 Oct 2020 06:16:58 +0000 (02:16 -0400)]
Update example.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Wed, 28 Oct 2020 18:11:33 +0000 (14:11 -0400)]
Log path len and # skipped out-of-order tiles per chromosome.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Wed, 28 Oct 2020 07:16:06 +0000 (03:16 -0400)]
Export tile annotations.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Mon, 26 Oct 2020 19:37:15 +0000 (15:37 -0400)]
Continue loop a bit earlier.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Mon, 26 Oct 2020 00:14:25 +0000 (20:14 -0400)]
Better series palette.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Sun, 25 Oct 2020 23:01:52 +0000 (19:01 -0400)]
Fix up example.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Sun, 25 Oct 2020 23:00:49 +0000 (19:00 -0400)]
Plot import stats.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Sun, 25 Oct 2020 01:18:33 +0000 (21:18 -0400)]
Write import stats to stats.json.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Sun, 25 Oct 2020 00:54:29 +0000 (20:54 -0400)]
Turn off timestamps in logs when not logging to a tty.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Fri, 23 Oct 2020 21:11:31 +0000 (17:11 -0400)]
Log input seq/chrom size, input coverage, tile coverage.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Fri, 23 Oct 2020 15:24:45 +0000 (11:24 -0400)]
Add CalledBases to stats.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 22 Oct 2020 20:59:24 +0000 (16:59 -0400)]
Fix coverage score.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 22 Oct 2020 18:45:23 +0000 (14:45 -0400)]
Update example.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 22 Oct 2020 15:25:14 +0000 (11:25 -0400)]
More RAM for export.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 22 Oct 2020 15:25:12 +0000 (11:25 -0400)]
Split big func.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 22 Oct 2020 13:43:04 +0000 (09:43 -0400)]
Assemble sequences concurrently.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 22 Oct 2020 06:58:04 +0000 (02:58 -0400)]
Update test.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 22 Oct 2020 06:57:46 +0000 (02:57 -0400)]
Fix divide by zero.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 22 Oct 2020 03:45:06 +0000 (23:45 -0400)]
Update test.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 22 Oct 2020 03:43:08 +0000 (23:43 -0400)]
Fix chr pos after chr1.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 22 Oct 2020 03:41:31 +0000 (23:41 -0400)]
Fix var reused in goroutine.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Wed, 21 Oct 2020 21:16:37 +0000 (17:16 -0400)]
Use only first field of fasta comment as sequence label.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Wed, 21 Oct 2020 21:15:37 +0000 (17:15 -0400)]
Handle N in ref.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Wed, 21 Oct 2020 21:14:46 +0000 (17:14 -0400)]
Merge more than two consecutive del/ins.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Wed, 21 Oct 2020 20:22:55 +0000 (16:22 -0400)]
Export bed file.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Tue, 20 Oct 2020 14:13:35 +0000 (10:13 -0400)]
Export VCF-ish.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Mon, 19 Oct 2020 15:01:58 +0000 (11:01 -0400)]
Handle variant 0 ("none") in CompactSequences.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Mon, 19 Oct 2020 14:31:11 +0000 (10:31 -0400)]
Add PadLeft func to help avoid empty ref/new fields.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Fri, 16 Oct 2020 14:10:05 +0000 (10:10 -0400)]
Add merge cmd.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Wed, 14 Oct 2020 18:33:01 +0000 (14:33 -0400)]
Export HGVS.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Mon, 12 Oct 2020 19:53:14 +0000 (15:53 -0400)]
Refactor merge.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Mon, 12 Oct 2020 13:50:00 +0000 (09:50 -0400)]
Delete dead code.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Mon, 12 Oct 2020 13:48:51 +0000 (09:48 -0400)]
Improve error message.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Mon, 12 Oct 2020 13:48:41 +0000 (09:48 -0400)]
Fix missed error check.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Mon, 12 Oct 2020 13:48:33 +0000 (09:48 -0400)]
Fix log message.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Mon, 12 Oct 2020 13:48:18 +0000 (09:48 -0400)]
Add merge command.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 8 Oct 2020 21:05:03 +0000 (17:05 -0400)]
Include tile variant IDs in library.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 8 Oct 2020 15:45:49 +0000 (11:45 -0400)]
Option to output full list of unplaced tags.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 8 Oct 2020 05:57:37 +0000 (01:57 -0400)]
Allow importing all-hom (reference) data from single fasta file.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Wed, 7 Oct 2020 21:01:34 +0000 (17:01 -0400)]
Add TagsPlacedNTimes stat.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Wed, 7 Oct 2020 17:59:36 +0000 (13:59 -0400)]
Increase library read buffer size.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Wed, 7 Oct 2020 13:22:50 +0000 (09:22 -0400)]
When writing library, write tags too.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Wed, 7 Oct 2020 13:22:46 +0000 (09:22 -0400)]
Add stats command.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Fri, 25 Sep 2020 20:29:23 +0000 (16:29 -0400)]
Option to treat tiles with no-calls as regular tiles.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Fri, 25 Sep 2020 19:56:42 +0000 (15:56 -0400)]
Option to output tile library when importing.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Fri, 25 Sep 2020 19:53:43 +0000 (15:53 -0400)]
Less memory for pca.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Fri, 18 Sep 2020 13:47:27 +0000 (09:47 -0400)]
Fix -max-coverage=1.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Fri, 18 Sep 2020 13:47:21 +0000 (09:47 -0400)]
More logs.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>
Tom Clegg [Thu, 17 Sep 2020 18:18:55 +0000 (14:18 -0400)]
Log dimensions.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomclegg.ca>