Tom Clegg [Thu, 10 Nov 2022 19:40:45 +0000 (14:40 -0500)]
19527: Fix crash on tag skipped for min-coverage.
refs #19527
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 10 Nov 2022 16:16:59 +0000 (11:16 -0500)]
19527: Option to exclude non-case/control samples.
refs #19527
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 10 Nov 2022 15:24:53 +0000 (10:24 -0500)]
Merge branch '19527-training-set'
refs #19527
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 9 Nov 2022 23:29:58 +0000 (18:29 -0500)]
19527: Accommodate header row in samples csv.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 9 Nov 2022 23:08:57 +0000 (18:08 -0500)]
19527: Ignore empty line at EOF.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 9 Nov 2022 20:12:33 +0000 (15:12 -0500)]
Merge branch '19524-pca'
refs #19524
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 9 Nov 2022 20:11:48 +0000 (15:11 -0500)]
19527: Fix p-value calculation.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 9 Nov 2022 19:39:31 +0000 (14:39 -0500)]
19527: Load training-set flag from samples.csv.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 7 Nov 2022 14:29:47 +0000 (09:29 -0500)]
19527: choose-samples: training/validation set.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 9 Nov 2022 19:24:49 +0000 (14:24 -0500)]
19527: Load training-set flag from samples.csv.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 7 Nov 2022 14:29:47 +0000 (09:29 -0500)]
choose-samples: training/validation set.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 2 Nov 2022 14:49:09 +0000 (10:49 -0400)]
19524: Fit PCA to specified training set.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 31 Oct 2022 15:53:26 +0000 (11:53 -0400)]
Merge branch '19524-pca'
refs #19524
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 21 Oct 2022 13:23:12 +0000 (09:23 -0400)]
19524: Fix matrix alloc size.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 20 Oct 2022 17:06:35 +0000 (13:06 -0400)]
19524: Flags choose which PCA components to plot.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 20 Oct 2022 15:23:08 +0000 (11:23 -0400)]
19524: Update colors, plot unknown-phenotype behind known.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 20 Oct 2022 14:07:11 +0000 (10:07 -0400)]
19524: Limit size of PCA input matrix.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 19 Oct 2022 20:17:36 +0000 (16:17 -0400)]
19524: Limit size of PCA input matrix.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 19 Oct 2022 19:55:56 +0000 (15:55 -0400)]
19524: configurable vcpus/ram
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 14 Oct 2022 17:34:23 +0000 (13:34 -0400)]
Merge branch '19524-pca'
refs #19524
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 13 Oct 2022 18:46:46 +0000 (14:46 -0400)]
19524: Use marker shape to indicate second category variable.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 13 Oct 2022 15:44:05 +0000 (11:44 -0400)]
19524: Remove obsolete pca cmds.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 13 Oct 2022 14:47:51 +0000 (10:47 -0400)]
19524: Fix deprecated scipy.load.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 13 Oct 2022 14:47:02 +0000 (10:47 -0400)]
19524: Read multiple phenotype files.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 13 Oct 2022 14:43:36 +0000 (10:43 -0400)]
19524: Generalize plot colors a little.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 13 Oct 2022 13:57:37 +0000 (09:57 -0400)]
Fail if inadvertently using randomness.
No issue #
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 12 Oct 2022 18:36:33 +0000 (14:36 -0400)]
19524: Fix colormap.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 12 Oct 2022 05:11:26 +0000 (01:11 -0400)]
19524: propagate pca-components arg.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 11 Oct 2022 18:40:03 +0000 (14:40 -0400)]
19524: plot: get sample list from csv instead of fasta filenames.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 11 Oct 2022 14:07:14 +0000 (10:07 -0400)]
19524: Output PCA.
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 7 Oct 2022 19:18:39 +0000 (15:18 -0400)]
Update deps, improve error reporting
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 7 Oct 2022 18:11:31 +0000 (14:11 -0400)]
Use min-coverage filter
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 5 Aug 2022 19:45:52 +0000 (15:45 -0400)]
Fix diff case
refs #19236 #note-20
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 3 Aug 2022 20:14:27 +0000 (16:14 -0400)]
Fix diff case
refs #19236 #note-15.7
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 27 Jul 2022 20:56:09 +0000 (16:56 -0400)]
Fix diff case
refs #19236 #note-15.6
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 27 Jul 2022 20:02:40 +0000 (16:02 -0400)]
Fix diff case
refs #19236 #note-15.4, #note-15.5
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 27 Jul 2022 18:48:05 +0000 (14:48 -0400)]
Fix diff case
refs #19236 #note-15.2, #note-15.3
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 26 Jul 2022 17:16:15 +0000 (13:16 -0400)]
Fix diff case
refs #19236 #note-15.1
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 20 Jul 2022 16:21:32 +0000 (12:21 -0400)]
Fix crash when ref tile is dropped due to duplicate tag.
refs #19236
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 15 Jul 2022 19:21:01 +0000 (15:21 -0400)]
Add test for variant at right end of spanning tile.
refs #19271
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 15 Jul 2022 17:51:56 +0000 (13:51 -0400)]
Generate annotations for spanning tiles.
refs #19271
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 14 Jul 2022 14:43:48 +0000 (10:43 -0400)]
Update tests.
No issue #
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 7 Jul 2022 18:28:36 +0000 (14:28 -0400)]
Fix wrong index in chunk>0 case.
refs #19168
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 1 Jul 2022 20:28:23 +0000 (16:28 -0400)]
Fix low-coverage tiles counting toward min coverage threshold.
refs #19168
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 3 Jun 2022 04:05:42 +0000 (00:05 -0400)]
Fix loss of precision in p-value calculation.
refs #19014
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 4 May 2022 05:16:51 +0000 (01:16 -0400)]
19073: Fix dup tag detection.
refs #19073
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 4 May 2022 03:22:31 +0000 (23:22 -0400)]
19073: Fix dup tag detection.
refs #19073
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 3 May 2022 17:57:01 +0000 (13:57 -0400)]
19073: Remove dup tags (>1 ref placement) from tilestats bed file.
refs #19073
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 29 Apr 2022 18:57:44 +0000 (14:57 -0400)]
Add -log10(pvalue) row to onehot-columns.npy output from slicenumpy.
closes #19014
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 16 Mar 2022 22:08:31 +0000 (18:08 -0400)]
Add tilestats cmd.
refs #18582
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 16 Mar 2022 17:16:22 +0000 (13:16 -0400)]
Write chunk-tag-offset.csv with chunked tilevariant# matrix.
refs #17996
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 4 Mar 2022 19:48:35 +0000 (14:48 -0500)]
Change zygosity column info from het=1 to hom=1.
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 24 Feb 2022 15:50:13 +0000 (10:50 -0500)]
Fix left-most diff cases.
refs #18721
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 22 Feb 2022 19:04:00 +0000 (14:04 -0500)]
Encode spanning tile as 0 in tile variant matrix.
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Sun, 20 Feb 2022 03:08:38 +0000 (22:08 -0500)]
More debugging info for -debug-tag.
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Sun, 20 Feb 2022 03:07:14 +0000 (22:07 -0500)]
Skip input files that aren't needed because -max-tag.
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 18 Feb 2022 20:11:19 +0000 (15:11 -0500)]
Fix left-most diff cases.
refs #18721
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 18 Feb 2022 20:11:17 +0000 (15:11 -0500)]
Add -debug-tag flag.
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 18 Feb 2022 14:57:01 +0000 (09:57 -0500)]
Fix -include-variant-1
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 17 Feb 2022 15:07:02 +0000 (10:07 -0500)]
Update tests (don't include both het+hom if only one passes filter).
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 17 Feb 2022 15:06:55 +0000 (10:06 -0500)]
Fix -max-tag filter.
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 17 Feb 2022 14:23:20 +0000 (09:23 -0500)]
Fix sparse one-hot coordinates for chunk n>0.
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 17 Feb 2022 14:22:47 +0000 (09:22 -0500)]
Fix -max-tag filter.
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 15 Feb 2022 17:54:27 +0000 (12:54 -0500)]
Option to include variant 1 in one-hot matrix.
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 11 Feb 2022 20:01:57 +0000 (15:01 -0500)]
Include tiles in one-hot matrix even if there is no ref tile.
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 9 Feb 2022 21:17:10 +0000 (16:17 -0500)]
Support -max-tag flag for debugging.
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 4 Feb 2022 21:17:23 +0000 (16:17 -0500)]
Don't include het just because corresponding hom passed.
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 4 Feb 2022 06:11:58 +0000 (01:11 -0500)]
Update logged stats.
refs #18664
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 4 Feb 2022 05:41:40 +0000 (00:41 -0500)]
Don't use tags that appear more than once per sequence.
refs #18664
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 3 Feb 2022 19:25:47 +0000 (14:25 -0500)]
Fix log message.
refs #18664
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 3 Feb 2022 02:37:19 +0000 (21:37 -0500)]
Skip tags that appear twice in the same chromosome.
refs #18664
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 31 Jan 2022 18:56:56 +0000 (13:56 -0500)]
Add dump command.
No issue #
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 27 Jan 2022 14:16:35 +0000 (09:16 -0500)]
Update memory-size log message.
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 27 Jan 2022 05:31:02 +0000 (00:31 -0500)]
Output -single-onehot as coordinates of non-zero values.
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 21 Jan 2022 19:05:34 +0000 (14:05 -0500)]
Fix Χ² calculation.
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 17 Jan 2022 18:03:44 +0000 (13:03 -0500)]
Use native client to read annotations.csv.
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 17 Jan 2022 16:04:43 +0000 (11:04 -0500)]
Add -single-onehot output option.
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 14 Jan 2022 20:24:11 +0000 (15:24 -0500)]
Load case/control/neither from csv column, fix Χ² filter.
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 13 Jan 2022 19:47:46 +0000 (14:47 -0500)]
Fix deadlock at container finish.
No issue #
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 13 Jan 2022 19:47:40 +0000 (14:47 -0500)]
Add -chunked-onehot option.
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 7 Jan 2022 16:20:06 +0000 (11:20 -0500)]
Handle fasta with no line breaks.
fixes #18619
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Fri, 7 Jan 2022 14:20:29 +0000 (09:20 -0500)]
Filter HGVS columns by coverage & p-value threshold.
refs #18438
refs #18495
refs #18581
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 30 Dec 2021 16:24:44 +0000 (11:24 -0500)]
Fix coordinates in hgvs annotations.
refs #18438
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 30 Dec 2021 15:20:30 +0000 (10:20 -0500)]
Container-watching fixes.
No issue #
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 30 Dec 2021 00:19:28 +0000 (19:19 -0500)]
Fix slice padding.
refs #18438
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 30 Dec 2021 00:19:26 +0000 (19:19 -0500)]
Fix blocking on gob encode.
refs #18438
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 29 Dec 2021 18:40:15 +0000 (13:40 -0500)]
More aggressive GC.
refs #18438
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Wed, 29 Dec 2021 14:33:09 +0000 (09:33 -0500)]
Fix index out of bounds error.
refs #18438
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 28 Dec 2021 21:45:39 +0000 (16:45 -0500)]
Refactor chunked-hgvs to use less memory.
refs #18438
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 27 Dec 2021 18:01:24 +0000 (13:01 -0500)]
Separate options to output single/chunked hgvs matrices.
refs #18438
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 21 Dec 2021 15:05:27 +0000 (10:05 -0500)]
Bump slice-numpy memory.
refs #17996
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Mon, 13 Dec 2021 15:29:56 +0000 (10:29 -0500)]
Include HGVS IDs in anno2vcf.
refs #18579
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 7 Dec 2021 16:01:19 +0000 (11:01 -0500)]
Use (throttle)Go() convenience.
No issue #
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Sat, 4 Dec 2021 06:40:19 +0000 (01:40 -0500)]
Call 2-base deletion-insertion as two adjacent SNPs.
refs #18496
refs #18438
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Thu, 2 Dec 2021 18:39:36 +0000 (13:39 -0500)]
Add null csv rows for ref/undiffed. Ensure 0 means ref in matrix.
refs #18496
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 30 Nov 2021 21:17:09 +0000 (16:17 -0500)]
Fix handling of TAG->CA (spell as T>C, =A, delG).
fixes #18496
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 30 Nov 2021 21:01:44 +0000 (16:01 -0500)]
Not a bug when ref is the only variant loaded.
refs #18438
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 30 Nov 2021 21:01:19 +0000 (16:01 -0500)]
Fix deadlock in skip-on-error case.
No issue #
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Tue, 30 Nov 2021 19:59:45 +0000 (14:59 -0500)]
Call SNPs separately when called within 1bp of start/end of indels.
fixes #18496
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
Tom Clegg [Sat, 27 Nov 2021 03:15:56 +0000 (22:15 -0500)]
Mention undiffable variants in annotations, write -2 in hgvs matrix.
refs #18438
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>