--- /dev/null
+[comment]: # (Copyright (C) The Lightning Authors. All rights reserved.)
+[comment]: # ()
+[comment]: # (SPDX-License-Identifier: AGPL-3.0)
+# Running tiling workflow
+===
+
+## Running the actual workflow
+---
+`arvados-cwl-runner --submit --no-wait --project-uuid <project_uuid> fasta2numpy-wf.cwl <input_yml>`
+
+The main workflow, `fasta2numpy-wf.cwl`, has the following workflow:
+
+1) Tile the input FASTA file
+2) Generate PCA values
+3) Perform logistic regression
+4) Perform chi^2 p-value tests
+5) Plot these values
+6) Output
+
+For examples of input yml files, see `yml/fasta2numpy-wf-100test.yml` and `yml/fasta2numpy-wf-0831_0315.yml`
+
+## Input parameters
+---
+- **fastadirs** - an array of fasta directories, in our implementation, each directory consists of around 100 fasta pairs.
+- **refdir** - cirectory containing reference FASTAs.
+
+The list of tags is needed to perform tiling
+- **tagset** - List of tags. Found here.
+
+Some parameters are used to determine how many processes, and how much each process is processing at a time:
+
+- **batchsize** - an integer determining the batch size when running lighting-import step, e.g., for batchsize 12, we run lightning-import for 12 fasta directories together as a batch, the resulting libraries then get merged by lightning-slice.
+- **threads** - number of parallel processes to run. This is necessary to avoid running out of memory.
+
+Some parameters are used as values passed to lightning on the command line as flags:
+
+- **mergeoutput** - option to slice numpy. `True` or `False` are optional values.
+- **expandregions** - Command Line value needed to run `lightning`. Default value is `0`.
+
+Some parameters are used to determine which portions of the genome the tiling workflow is run on:
+
+- **chrs**: chromosones to run on.
+- **regions** - specific regions of the chromosomes to run on.
+- **matchgenome**: a string pattern used for obtaining a subset of the cohort, e.g, matchgenome "ADNI|WCAP" runs tiling for all samples with "ADNI" or "WCAP" in their name, matchgenome "" runs for the entire cohort.
+
+Some int/float parameters are needed for setting up random generation, output of statistical tests, etc:
+
+- **randomseed** - Random seed for random number generation.
+- **pcacomponents** - Top N PCA components to extract from PCA
+- **trainingsetsize**: a float between 0 and 1 to determine the training set size..
+
+Phenotypes are used as sample metadata for lightning:
+
+- **phenotypesnofamilydir** - phenotype information for samples with *no* family members.
+- **phenotypesdir** - phenotype information for samples *with* family members.
+
+Some publicily accessible data is needed to run the workflows:
+
+- **snpeffdatadir** -
+- **dbsnp** -
+- **gnomaddir** - gnomAD data.
\ No newline at end of file
+++ /dev/null
-Running tiling workflow
-===
-
-Command
----
-
-arvados-cwl-runner --submit --no-wait --project-uuid <project_uuid> fasta2numpy-wf.cwl <input_yml>
-
-For examples of input yml files, see yml/fasta2numpy-wf-100test.yml and yml/fasta2numpy-wf-0831_0315.yml
-
-Notable parameters for input yml
----
-
-fastadirs: an array of fasta directories, in our implementation, each directory consists of around 100 fasta pairs
-
-batchsize: an integer determining the batch size when running lighting-import step, e.g., for batchsize 12, we run lightning-import for 12 fasta directories together as a batch, the resulting libraries then get merged by lightning-slice
-
-matchgenome: a string pattern used for obtaining a subset of the cohort, e.g, matchgenome "ADNI|WCAP" runs tiling for all samples with "ADNI" or "WCAP" in their name, matchgenome "" runs for the entire cohort
-
-trainingsetsize: a float between 0 and 1 to determine the training set size