Use simpler prior by default

gaow · gaow · commit 36688ea3fa59 · 2024-02-14T08:49:43.000-05:00
diff --git a/code/cis_analysis/cis_workhorse.ipynb b/code/cis_analysis/cis_workhorse.ipynb
@@ -1014,7 +1014,7 @@
     "parameter: pip_cutoff = 0.05\n",
     "parameter: coverage = [0.95, 0.7, 0.5]\n",
     "# prior can be either of [\"mixture_normal\", \"mixture_normal_per_scale\"]\n",
-    "parameter: prior = \"mixture_normal_per_scale\"\n",
+    "parameter: prior = \"mixture_normal\"\n",
     "parameter: max_SNP_EM = 100\n",
     "# Max scale is such that 2^max_scale being the number of phenotypes in the transformed space. Default to 2^10  = 1024. Don't change it unless you know what you are doing. Max_scale should be at least larger than 5.\n",
     "parameter:  max_scale = 10\n",
diff --git a/pipeline/command_spliter.ipynb b/pipeline/command_spliter.ipynb
diff --git a/website/nature_protocol/output_markdown.md b/website/nature_protocol/output_markdown.md
@@ -99,8 +99,6 @@ Quality control and normalization are performed on output from the leafcutter an
 We use a gene coordinate annotation pipeline based on [`pyqtl`, as demonstrated here](https://github.com/broadinstitute/gtex-pipeline/blob/master/qtl/src/eqtl_prepare_expression.py). This adds genomic coordinate annotations to gene-level molecular phenotype files generated in `gct` format and converts them to `bed` format for downstreams analysis.
 
 
-A collection of methods for the imputation of missing omics data values are included in our pipelinle. Imputation is optional of eQTL analysis, but necessary for other QTLs. We use `flashier`, a Empirical Bayes Matrix Factorization model, to impute missing values. Other imputation methods include missForest, XGBoost, k-nearest neighbors, soft impute, mean imputation, and last observed data.
-
 We include a collection of workflows to format molecular phenotype data. These include workflows to separate phenotypes by chromosome, by user-provided regions, a workflow to subset bam files and a workflow to extract samples from phenotype files.
 
 ##### B.  Covariate Data Preprocessing
@@ -306,12 +304,11 @@ Timing <1 min
 ```
 
 
-##### ii. Phenotype Imputation
-Timing X min
 
 ```
-sos run xqtl-pipeline/pipeline/phenotype_imputation.ipynb flash \
-    --container oras://ghcr.io/cumc/omics_imputation_apptainer:latest
+sos run phenotype_imputation.ipynb EBMF \
+    --container .containers/factor_analysis.sif \
+
 ```