20461: Updating resource requirements

author Alex Coleman <alex.coleman@curii.com>

Wed, 30 Aug 2023 16:53:31 +0000 (10:53 -0600)

committer Alex Coleman <alex.coleman@curii.com>

Thu, 14 Sep 2023 18:00:08 +0000 (12:00 -0600)
author Alex Coleman <alex.coleman@curii.com>
Wed, 30 Aug 2023 16:53:31 +0000 (10:53 -0600)
committer Alex Coleman <alex.coleman@curii.com>
Thu, 14 Sep 2023 18:00:08 +0000 (12:00 -0600)
diff --git a/cwl/lightning/README.md b/cwl/lightning/README.md

index 4ca131faaf5a4287e352d9137dca3d531282bf73..ad484202b8ef8446aeefb5a3a2a494dda65be504 100644 (file)
--- a/cwl/lightning/README.md
+++ b/cwl/lightning/README.md
@@ -2,10 +2,10 @@
  [comment]: # ()
  [comment]: # (SPDX-License-Identifier: AGPL-3.0)
  # Running tiling workflow
-===
+Tiling is an efficient representation for genomic data that enables fast queries and machine learning. It abstracts a called genome by partitioning it into overlapping variable length shorter sequences, known as tiles. This tiling workflow tiles an input file, and peforms some statistical analysis on it. 
  
  ## Running the actual workflow
----
+To run on Arvados:
  `arvados-cwl-runner --submit --no-wait --project-uuid <project_uuid> fasta2numpy-wf.cwl <input_yml>`
  
  The main workflow, `fasta2numpy-wf.cwl`, has the following workflow:
@@ -20,12 +20,14 @@ The main workflow, `fasta2numpy-wf.cwl`, has the following workflow:
  For examples of input yml files, see `yml/fasta2numpy-wf-100test.yml` and `yml/fasta2numpy-wf-0831_0315.yml`
  
  ## Input parameters
----
+
+The tiling workflow has many different inputs, some (like **fastadirs**) vary depending on your run, while others remain more constant (like **dbsnp**)
+
  - **fastadirs** - an array of fasta directories, in our implementation, each directory consists of around 100 fasta pairs.
  - **refdir** - cirectory containing reference FASTAs.
  
  The list of tags is needed to perform tiling
-- **tagset** - List of tags. Found here.
+- **tagset** - List of tags. Found here: c37923fd267415556962d5c535e9b075+110/tagset.fa.gz
  
  Some parameters are used to determine how many processes, and how much each process is processing at a time:
  
@@ -46,8 +48,8 @@ Some parameters are used to determine which portions of the genome the tiling wo
  Some int/float parameters are needed for setting up random generation, output of statistical tests, etc:
  
  - **randomseed** - Random seed for random number generation.
-- **pcacomponents** - Top N PCA components to extract from PCA
-- **trainingsetsize**: a float between 0 and 1 to determine the training set size..
+- **pcacomponents** - Top N PCA components to extract from PCA. 
+- **trainingsetsize**: a float between 0 and 1 to determine the training set size.
    
  Phenotypes are used as sample metadata for lightning:
  
@@ -56,6 +58,6 @@ Phenotypes are used as sample metadata for lightning:
  
  Some publicily accessible data is needed to run the workflows:
  
-- **snpeffdatadir** - 
-- **dbsnp** - 
-- **gnomaddir** - gnomAD data. 
-\ No newline at end of file
+- **snpeffdatadir** - Directory of SNP data download. Current data download can be found here: 66c966928931de252274772c76f73025+52054
+- **dbsnp** - SNP database. A single file. Current database can be found here: a088b297d614e4c63cbb23f8ad404438+12313/00-All.vcf.gz_renamed.bcf
+- **gnomaddir** - gnomAD data. Current data can be found here: c6a8fc877e85d73ac5b165e2d7367e26+675135
+\ No newline at end of file
diff --git a/cwl/lightning/lightning-anno2vcf.cwl b/cwl/lightning/lightning-anno2vcf.cwl

index d91aa795b99e2da0bc6fc9fe77f307b304eb2b89..ae8568fdb23f2ee0c0d322f3b64251e36d786e8c 100644 (file)
--- a/cwl/lightning/lightning-anno2vcf.cwl
+++ b/cwl/lightning/lightning-anno2vcf.cwl
@@ -14,7 +14,7 @@ hints:
      dockerPull: lightning
    ResourceRequirement:
      coresMin: 64
-    ramMin: 100000 #500000
+    ramMin: 200000 #500000
    arv:RuntimeConstraints:
      keep_cache: 83000
      outputDirType: keep_output_dir
diff --git a/cwl/lightning/lightning-choose-samples.cwl b/cwl/lightning/lightning-choose-samples.cwl

index f03c585aba96ae6f953137eb5490eaae2e9b053d..92bef11336d1936794ed8318fd4acbb2376a1f3a 100644 (file)
--- a/cwl/lightning/lightning-choose-samples.cwl
+++ b/cwl/lightning/lightning-choose-samples.cwl
@@ -44,7 +44,7 @@ arguments:
    - prefix: "-case-control-file="
      valueFrom: $(inputs.phenotypesdir)
      separate: false
-  - "-case-control-column=AD"
+  - "-case-control-column=DISEASE"
    - prefix: "-training-set-size="
      valueFrom: $(inputs.trainingsetsize)
      separate: false
diff --git a/cwl/lightning/lightning-slice-numpy-onehot.cwl b/cwl/lightning/lightning-slice-numpy-onehot.cwl

index 7bd02101f751df9639bb3e43a9da27f97f8d40c6..2d58232bdb869d0db75ce13b4b52d15d9e9f7447 100644 (file)
--- a/cwl/lightning/lightning-slice-numpy-onehot.cwl
+++ b/cwl/lightning/lightning-slice-numpy-onehot.cwl
@@ -14,7 +14,7 @@ hints:
      dockerPull: lightning
    ResourceRequirement:
      coresMin: 64
-    ramMin: 100000 #660000
+    ramMin: 200000 #660000
    arv:RuntimeConstraints:
      keep_cache: 83000
      outputDirType: keep_output_dir
diff --git a/cwl/lightning/lightning-slice-numpy-pca.cwl b/cwl/lightning/lightning-slice-numpy-pca.cwl

index 689b3b7bc11fa73d6b33d306ab183481c4f9e48b..b4818eb6acff00e421bf7d92908b59f490d9d17f 100644 (file)
--- a/cwl/lightning/lightning-slice-numpy-pca.cwl
+++ b/cwl/lightning/lightning-slice-numpy-pca.cwl
@@ -14,7 +14,7 @@ hints:
      dockerPull: lightning
    ResourceRequirement:
      coresMin: 64
-    ramMin: 100000 #1500000
+    ramMin: 200000 #1500000
    arv:RuntimeConstraints:
      keep_cache: 83000
      outputDirType: keep_output_dir
diff --git a/cwl/lightning/lightning-slice-numpy.cwl b/cwl/lightning/lightning-slice-numpy.cwl

index 8e61d1af1fddde77466f1734d867e4e9b5212af0..6c48ff3804861e79c204240fcf94ac4550234a67 100644 (file)
--- a/cwl/lightning/lightning-slice-numpy.cwl
+++ b/cwl/lightning/lightning-slice-numpy.cwl
@@ -14,7 +14,7 @@ hints:
      dockerPull: lightning
    ResourceRequirement:
      coresMin: 64
-    ramMin: 100000 #660000
+    ramMin: 200000 #660000
    arv:RuntimeConstraints:
      keep_cache: 83000
      outputDirType: keep_output_dir
diff --git a/cwl/lightning/lightning-slice.cwl b/cwl/lightning/lightning-slice.cwl

index 3fed33f9708b554a0e5ad43be68ca288eafddb45..f416e55d4866fd57020da6a169f0037b4ec2930a 100644 (file)
--- a/cwl/lightning/lightning-slice.cwl
+++ b/cwl/lightning/lightning-slice.cwl
@@ -14,7 +14,7 @@ hints:
      dockerPull: lightning
    ResourceRequirement:
      coresMin: 64 #96
-    ramMin: 100000 #660000
+    ramMin: 200000 #660000
    arv:RuntimeConstraints:
      keep_cache: 6200
      outputDirType: keep_output_dir
diff --git a/docker/lightning/Dockerfile b/docker/lightning/Dockerfile

index 061d9d45a7a09f8bd3b7f87c28bd75f30ff6063b..d11a30b43d827e53c31d0203fc379ae42d688c0c 100644 (file)
--- a/docker/lightning/Dockerfile
+++ b/docker/lightning/Dockerfile
@@ -5,9 +5,7 @@
  # build instruction:
  # docker build -t dockername --file=/path/to/lightning/docker/lightning/Dockerfile /path/to/lightning
  
-FROM ubuntu:latest
-MAINTAINER Jiayong Li <jli@curii.com>s
-USER root
+FROM python:3.11-buster
  ARG DEBIAN_FRONTEND=noninteractive
  
  # Install necessary dependencies
@@ -24,7 +22,6 @@ RUN apt-get install -qy --no-install-recommends wget \
    libncursesw5-dev \
    gcc \
    make \
-  python3.8 \
    python3-pip \
    python3-numpy \
    python3-pandas \
@@ -34,6 +31,7 @@ RUN apt-get install -qy --no-install-recommends wget \
  
  RUN pip3 install sklearn
  RUN pip3 install --upgrade scipy
+RUN pip3 install matplotlib
  
  # Installing go 1.19
author	Alex Coleman <alex.coleman@curii.com>
	Wed, 30 Aug 2023 16:53:31 +0000 (10:53 -0600)
committer	Alex Coleman <alex.coleman@curii.com>
	Thu, 14 Sep 2023 18:00:08 +0000 (12:00 -0600)
cwl/lightning/README.md		patch \| blob \| history
cwl/lightning/lightning-anno2vcf.cwl		patch \| blob \| history
cwl/lightning/lightning-choose-samples.cwl		patch \| blob \| history
cwl/lightning/lightning-slice-numpy-onehot.cwl		patch \| blob \| history
cwl/lightning/lightning-slice-numpy-pca.cwl		patch \| blob \| history
cwl/lightning/lightning-slice-numpy.cwl		patch \| blob \| history
cwl/lightning/lightning-slice.cwl		patch \| blob \| history
docker/lightning/Dockerfile		patch \| blob \| history