Tom Clegg [Wed, 24 Nov 2021 20:19:27 +0000 (15:19 -0500)]
Use non-preemptible instances.
No issue #
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 24 Nov 2021 20:19:19 +0000 (15:19 -0500)]
Option to merge matrices and annotations.
refs #18438
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 22 Nov 2021 15:20:52 +0000 (10:20 -0500)]
Implement -regions and -expand-regions for slice-numpy.
refs #18438
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 19 Nov 2021 18:23:16 +0000 (13:23 -0500)]
Adjust slice memory.
refs #18414
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 4 Nov 2021 18:11:42 +0000 (14:11 -0400)]
Write deletions as "TAA T" instead of "AA .".
refs #17763
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 4 Nov 2021 18:10:14 +0000 (14:10 -0400)]
Fix variant numbering.
Renumbering code was incorrectly reserving ranking spots for no-calls.
refs #17763
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 1 Nov 2021 14:41:01 +0000 (10:41 -0400)]
Update slice-numpy test.
refs #17763
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 1 Nov 2021 14:27:04 +0000 (10:27 -0400)]
Fix duplicate entries in slice-numpy annotations.
refs #17763
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 1 Nov 2021 14:00:38 +0000 (10:00 -0400)]
Avoid empty "ref" field in anno2vcf output.
refs #17763
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 1 Nov 2021 13:31:04 +0000 (09:31 -0400)]
Split anno2vcf output by chromosome.
refs #17763
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 29 Oct 2021 18:39:42 +0000 (14:39 -0400)]
Fix sort func.
refs #17763
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 28 Oct 2021 15:28:32 +0000 (11:28 -0400)]
Bump slice-numpy memory.
refs #17763
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 28 Oct 2021 15:28:04 +0000 (11:28 -0400)]
Sort input files.
refs #17763
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 28 Oct 2021 14:33:43 +0000 (10:33 -0400)]
Add anno2vcf command.
refs #17763
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 8 Oct 2021 02:12:06 +0000 (22:12 -0400)]
Accept PDH on command line.
No issue #
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 6 Oct 2021 20:07:40 +0000 (16:07 -0400)]
Use arvados client for /any/path/$id/, not just /mnt/$id/.
refs #17996
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 29 Sep 2021 18:47:34 +0000 (14:47 -0400)]
Implement -match-genome filter.
refs #17996
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 29 Sep 2021 18:46:10 +0000 (14:46 -0400)]
Fix namespace types.
refs #17996
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Sun, 26 Sep 2021 02:47:00 +0000 (22:47 -0400)]
Save all ref tile data in slice 0.
refs #17996
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Sun, 26 Sep 2021 01:56:08 +0000 (21:56 -0400)]
Cleanup var name.
No issue #
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Sat, 25 Sep 2021 13:44:22 +0000 (09:44 -0400)]
Scale keep cache to 2*(openfiles+1).
refs #17966
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 24 Sep 2021 18:09:16 +0000 (14:09 -0400)]
Improve logging.
refs #17966
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 24 Sep 2021 18:08:59 +0000 (14:08 -0400)]
Tweak cpu/mem usage.
refs #17966
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 23 Sep 2021 19:52:42 +0000 (15:52 -0400)]
Accept multiple input libraries for slice→slicenumpy.
refs #17966
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 21 Sep 2021 13:59:03 +0000 (09:59 -0400)]
Log total variant/genome/ref counts.
refs #17966
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 20 Sep 2021 13:20:30 +0000 (09:20 -0400)]
Fix array index out of bounds.
refs #17966
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 17 Sep 2021 01:50:45 +0000 (21:50 -0400)]
Renumber variants by allele count.
refs #17966
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 17 Sep 2021 00:33:38 +0000 (20:33 -0400)]
Fix reference assembly.
refs #17966
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 14 Sep 2021 20:15:35 +0000 (16:15 -0400)]
Fix mem usage, improve logging.
refs #17996
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 13 Sep 2021 15:23:23 +0000 (11:23 -0400)]
Generate annotations for slices.
refs #17996
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 13 Sep 2021 14:01:49 +0000 (10:01 -0400)]
Generate numpy matrices from slices.
refs #17996
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Sun, 12 Sep 2021 19:24:45 +0000 (15:24 -0400)]
Always log version at startup.
No issue #
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Sun, 12 Sep 2021 19:24:20 +0000 (15:24 -0400)]
Slice imported data by tag#.
refs #17996
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 9 Sep 2021 15:28:08 +0000 (11:28 -0400)]
Use callbacks in struct instead of args to Load*().
No issue #
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 1 Sep 2021 17:44:57 +0000 (13:44 -0400)]
Fix missed TranslatePaths and unreported error.
refs #17562
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 31 Aug 2021 18:56:04 +0000 (14:56 -0400)]
Add -p-value and -cases options for exporting hgvs numpy.
refs #17562
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 31 Aug 2021 18:55:56 +0000 (14:55 -0400)]
Add chi square func.
refs #17562
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 12 Aug 2021 13:56:50 +0000 (09:56 -0400)]
Fix unchecked error.
No issue #
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 12 Aug 2021 03:16:36 +0000 (23:16 -0400)]
Fix array index out of bounds.
refs #17562
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 11 Aug 2021 22:10:26 +0000 (18:10 -0400)]
Add -match-genome=regexp filter.
refs #17939
refs #17922
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 11 Aug 2021 21:31:23 +0000 (17:31 -0400)]
Export hgvs one-hot numpy: -1 for missing / low quality tiles.
refs #17562
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 10 Aug 2021 19:26:37 +0000 (15:26 -0400)]
Add filter options to export cmd.
refs #17562
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 10 Aug 2021 13:18:10 +0000 (09:18 -0400)]
Add .licenseignore.
No issue #
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 10 Aug 2021 13:16:58 +0000 (09:16 -0400)]
Memory tweaks.
refs #17562
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 6 Aug 2021 16:49:21 +0000 (12:49 -0400)]
Add copyright headers.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 6 Aug 2021 16:11:32 +0000 (12:11 -0400)]
Reduce memory use, Limit goroutines when exporting numpy.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 3 Aug 2021 17:18:14 +0000 (13:18 -0400)]
Reduce lock contention.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 3 Aug 2021 15:27:08 +0000 (11:27 -0400)]
Write annotations through to conserve memory.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 2 Aug 2021 19:09:51 +0000 (15:09 -0400)]
Export hgvs one-hot numpy.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 30 Jul 2021 13:20:00 +0000 (09:20 -0400)]
Use tsv filename when using tab separator.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 29 Jul 2021 20:49:24 +0000 (16:49 -0400)]
Track which tile variants each hgvs.Variant appeared in.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 15 Jul 2021 15:07:08 +0000 (11:07 -0400)]
Skip diffs on long ref seqs.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 13 Jul 2021 19:37:38 +0000 (15:37 -0400)]
Fix some entries skipped by WriteDir.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 13 Jul 2021 15:25:00 +0000 (11:25 -0400)]
Compress export output.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 13 Jul 2021 14:09:16 +0000 (10:09 -0400)]
Fix up VCF format.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 12 Jul 2021 16:07:31 +0000 (12:07 -0400)]
Fix recomputing diffs N times.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 12 Jul 2021 14:20:11 +0000 (10:20 -0400)]
Separate pvcf/vcf output.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 9 Jul 2021 17:04:42 +0000 (13:04 -0400)]
Progress indicator for exportSeq.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 9 Jul 2021 14:32:46 +0000 (10:32 -0400)]
Improve concurrency in export-diff.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 9 Jul 2021 13:13:03 +0000 (09:13 -0400)]
Include tagset in reference-genome files.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 8 Jul 2021 23:06:07 +0000 (19:06 -0400)]
Fix missing output.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 8 Jul 2021 20:48:14 +0000 (16:48 -0400)]
Write ref seqs to their own files.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 8 Jul 2021 20:48:04 +0000 (16:48 -0400)]
Use more cpus.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 8 Jul 2021 20:47:35 +0000 (16:47 -0400)]
Fix use of nil writer.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 8 Jul 2021 15:25:43 +0000 (11:25 -0400)]
Enable network access to get port forwarding.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 8 Jul 2021 14:24:44 +0000 (10:24 -0400)]
Option to export one vcf/csv file per chromosome.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 18 Jun 2021 13:26:31 +0000 (09:26 -0400)]
Fix race.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 17 Jun 2021 21:17:26 +0000 (17:17 -0400)]
Improve concurrency more.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 17 Jun 2021 20:14:48 +0000 (16:14 -0400)]
Improve loading concurrency.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 17 Jun 2021 19:23:37 +0000 (15:23 -0400)]
Split sequence data into 64K independently locked partitions.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 17 Jun 2021 18:37:57 +0000 (14:37 -0400)]
Fix exportnumpy test.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 17 Jun 2021 15:25:56 +0000 (11:25 -0400)]
Add concurrent load to exportnumpy.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 17 Jun 2021 15:25:44 +0000 (11:25 -0400)]
Skip expensive Tracef.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 17 Jun 2021 15:15:32 +0000 (11:15 -0400)]
Up load throttle to gomaxprocs.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 17 Jun 2021 15:13:13 +0000 (11:13 -0400)]
Move alloc/copy out of lock.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 17 Jun 2021 15:03:47 +0000 (11:03 -0400)]
Limit load goroutines to gomaxprocs/2.
Otherwise we bottleneck early on lots of IO that we can't process that
quickly.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 17 Jun 2021 14:22:15 +0000 (10:22 -0400)]
Distribute genomes across output files.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 17 Jun 2021 14:02:25 +0000 (10:02 -0400)]
Fix lock held during getRef.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 15 Jun 2021 15:02:33 +0000 (11:02 -0400)]
Fixup multiple-file reading.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 15 Jun 2021 13:03:23 +0000 (09:03 -0400)]
Fix "gob: encoder: message too big"
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 15 Jun 2021 04:47:35 +0000 (00:47 -0400)]
Add "flake" command (read, tidy, write).
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 14 Jun 2021 15:12:58 +0000 (11:12 -0400)]
Add concurrency in Tidy remap phase.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 14 Jun 2021 15:10:54 +0000 (11:10 -0400)]
Adjust memory.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 14 Jun 2021 03:09:54 +0000 (23:09 -0400)]
Write hgvs-based numpy matrix.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 3 Jun 2021 20:27:44 +0000 (16:27 -0400)]
Fix another category of misspelled hgvs diff.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 25 May 2021 13:40:21 +0000 (09:40 -0400)]
Bring memory back down.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 24 May 2021 14:54:42 +0000 (10:54 -0400)]
Limit tile size in export.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 21 May 2021 14:30:28 +0000 (10:30 -0400)]
Save runtime profile data periodically.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 20 May 2021 06:01:14 +0000 (02:01 -0400)]
Don't try to buffer/sort export.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 19 May 2021 00:45:35 +0000 (20:45 -0400)]
More memory, but release buffers.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 18 May 2021 20:51:07 +0000 (16:51 -0400)]
Manage export memory.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 18 May 2021 20:50:46 +0000 (16:50 -0400)]
Add test.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 11 May 2021 05:41:09 +0000 (01:41 -0400)]
Avoid buffering entire output in memory.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 7 May 2021 13:47:18 +0000 (09:47 -0400)]
Export hgvs-onehot.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 26 Apr 2021 15:32:13 +0000 (11:32 -0400)]
Bump exportnumpy memory.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 23 Apr 2021 17:05:32 +0000 (13:05 -0400)]
Bump merge memory.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 23 Apr 2021 17:05:24 +0000 (13:05 -0400)]
Error out early.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 12 Apr 2021 20:33:21 +0000 (16:33 -0400)]
Bump exportnumpy memory.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 12 Apr 2021 13:19:45 +0000 (09:19 -0400)]
Bump merge memory.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 8 Apr 2021 20:04:05 +0000 (16:04 -0400)]
Fix HGVS diff: GGAA>AAAA is GG>AA, not delGG,=AA,insAA.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>