Skip to content

FINEMAP credible sets not included in the results #7

@jdblischak

Description

@jdblischak

While exploring the results in the echolocatoR Shiny app, I noticed that FINEMAP never returns more than 1 credible set per locus. This was confusing since SuSiE often returns more than 1 credible set.

I started diving into the code, and I think the issue is that it only looks for data.cred, when FINEMAP returns data.cred# where # is the number of credible sets. PolyFun obtains the credible sets by searching backwards from the maximum number of causal SNPs until it finds a .cred file (source):

        #add causal set info
        df_finemap['CREDIBLE_SET'] = 0
        cred_file = None
        for m in range(num_causal_snps, 0, -1):
            if os.path.exists(cred_filename+str(m)):
                cred_file = cred_filename+str(m)
                break
        if cred_file is None:
            raise IOError('cred file not found')
        df_cred = pd.read_table(cred_file, sep=' ', usecols=(lambda c: c.startswith('cred')), comment='#')
        df_finemap.set_index('SNP', inplace=True, drop=False)
        for c_i, c in enumerate(df_cred.columns):
            df_finemap.loc[df_cred[c].dropna(), 'CREDIBLE_SET'] = c_i+1
        df_finemap.reset_index(inplace=True, drop=True)

echolocatoR only checks for data.cred:

https://github.com/RajLabMSSM/echolocatoR/blob/b055ac0fb74c914d7600e3afc650a2ffd7149396/R/FINEMAP.R#L336

And then when it is extracting the PIPs from the .snp file, it assigns any SNP that meets the PIP threshold to CS 1.

https://github.com/RajLabMSSM/echolocatoR/blob/b055ac0fb74c914d7600e3afc650a2ffd7149396/R/FINEMAP.R#L363-L364

I attempted to put together a minimal, reproducible example to demonstrate this behavior. However, using the latest version on master, FINEMAP is currently returning NA for both the .CS and the .PP columns.

See here for minimal, reproducible example
library(echolocatoR)
stopifnot(packageVersion("echolocatoR") == "0.1.2")

data("Nalls_top_SNPs")
top_SNPs <- import_topSNPs(
  topSS = Nalls_top_SNPs,
  position_col = "BP",
  pval_col = "P, all studies",
  effect_col = "Beta, all studies",
  gene_col = "Nearest Gene",
  locus_col = "Nearest Gene",
  remove_variants = "rs34637584"
)
fullSS_path <- example_fullSS()

Nalls23andMe_2019.results <- finemap_loci(
  top_SNPs = top_SNPs,
  results_dir = file.path(getwd(), "results"),
  loci = "BST1",
  dataset_name = "Nalls23andMe_2019",
  remove_tmps = FALSE,
  fullSS_path = fullSS_path,
  query_by = "tabix",
  snp_col = "RSID",
  pval_col = "p",
  effect_col = "beta",
  stderr_col = "se",
  freq_col = "freq",
  MAF_col = "calculate",
  bp_distance = 10000,
  min_MAF = 0.001,
  finemap_methods = c("FINEMAP", "SUSIE"),
  LD_reference = "UKB",
  download_method = "axel",
  plot.types = c()
)

Nalls23andMe_2019.results[SUSIE.CS > 0, list(SNP, FINEMAP.CS, FINEMAP.PP, SUSIE.CS, SUSIE.PP)]
readLines("results/GWAS/Nalls23andMe_2019/BST1/FINEMAP/data.snp", n = 3)
readLines("results/GWAS/Nalls23andMe_2019/BST1/FINEMAP/data.cred5")

sessionInfo()
See here for the results
> source("reprex.R", echo = TRUE)

> library(echolocatoR)
Registered S3 method overwritten by 'GGally':
  method from
  +.gg   ggplot2
Possible Ensembl SSL connectivity problems detected.
Please see the 'Connection Troubleshooting' section of the biomaRt vignette
vignette('accessing_ensembl', package = 'biomaRt')Error in curl::curl_fetch_memory(url, handle = handle) :
  SSL certificate problem: self signed certificate in certificate chain

Bioconductor version 3.12 (BiocManager 1.30.12), ?BiocManager::install for help

> stopifnot(packageVersion("echolocatoR") == "0.1.2")

> data("Nalls_top_SNPs")

> top_SNPs <- import_topSNPs(
+   topSS = Nalls_top_SNPs,
+   position_col = "BP",
+   pval_col = "P, all studies",
+   effect_col = "Beta, all studie ..." ... [TRUNCATED]
[1] "+ Assigning gene_col and locus_col independently"

> fullSS_path <- example_fullSS()

> Nalls23andMe_2019.results <- finemap_loci(
+   top_SNPs = top_SNPs,
+   results_dir = file.path(getwd(), "results"),
+   loci = "BST1",
+   dataset_ .... [TRUNCATED]
[1] "+ CONDA:: Activating conda env 'echoR'"
[1] "Checking for tabix installation..."
[1] "Checking for bcftools installation..."

)   )  ) ))))))}}}}}}}} {{{{{{{{{(((((( (  (   (
BST1 (1 / 1)
)   )  ) ))))))}}}}}}}} {{{{{{{{{(((((( (  (   (
[1] "+ Extracting relevant variants from fullSS..."
[1] "+ Query Method: tabix"
[1] "+ QUERY: Chromosome = 4 ; Min position = 15727348 ; Max position = 15747348"
[1] "TABIX:: Converting full summary stats file to tabix format for fast querying..."
[1] "+ CONDA:: Identified bgzip executable in echoR env."
[1] "( grep 'CHR' ./Nalls23andMe_2019.fullSS_subset.tsv; grep -v ^'CHR' ./Nalls23andMe_2019.fullSS_subset.tsv | sort -k1,1 -k2,2n ) | ~/mambaforge/envs/echoR/bin/bgzip -f > ~/echolocatoR/results/GWAS/Nalls23andMe_2019/Nalls23andMe_2019.fullSS_subset.tsv.gz"
[1] "+ CONDA:: Identified tabix executable in echoR env."
[1] "TABIX:: Indexing"
[1] "~/mambaforge/envs/echoR/bin/tabix -f -S 1 -s 1 -b 2 -e 2 ~/echolocatoR/results/GWAS/Nalls23andMe_2019/Nalls23andMe_2019.fullSS_subset.tsv.gz"
[1] "Determining chrom type from file header"
[1] "Chromosome format = 1"
[1] "+ CONDA:: Identified tabix executable in echoR env."
[1] "TABIX:: Extracting subset of sum stats"
[1] "+ TABIX:: ~/mambaforge/envs/echoR/bin/tabix -h ~/echolocatoR/results/GWAS/Nalls23andMe_2019/Nalls23andMe_2019.fullSS_subset.tsv.gz 4:15727348-15747348"
[1] "+ TABIX:: Returning 115 x 11 data.table"
[1] "++ Saving query ==> ~/echolocatoR/results/GWAS/Nalls23andMe_2019/BST1/BST1_Nalls23andMe_2019_subset.tsv.gz"
[1] "LD:: Standardizing summary statistics subset."
[1] "++ Preparing Gene col"
[1] "++ Preparing A1,A1 cols"
[1] "++ Preparing MAF,Freq cols"
[1] "++ Inferring MAF from frequency column..."
[1] "++ Preparing N_cases,N_controls cols"
[1] "++ Preparing `proportion_cases` col"
[1] "++ Calculating `proportion_cases`."
[1] "++ Preparing N col"
[1] "+ Preparing sample_size (N) column"
[1] "++ Computing effective sample size."
[1] "++ Preparing t-stat col"
[1] "+ Calculating t-statistic from Effect and StdErr..."
[1] "++ Assigning lead SNP"
[1] "++ Ensuring Effect, StdErr, P are numeric"
[1] "++ Ensuring 1 SNP per row"
[1] "++ Removing extra whitespace"
[1] "++ Saving subset ==> ~/echolocatoR/results/GWAS/Nalls23andMe_2019/BST1/BST1_Nalls23andMe_2019_subset.tsv.gz"
[1] "+ Extraction completed in 6.72 seconds"
[1] "+ 115 SNPs x  16 columns"
[1] "LD:: Using UK Biobank LD reference panel."
[1] "+ UKB LD file name: chr4_15000001_18000001"
[1] "+ LD:: Downloading full .gz/.npz UKB files and saving to disk."
[1] "+ CONDA:: Identified axel executable in echoR env."
[1] "+ CONDA:: Identified axel executable in echoR env."
[1] "+ LD:: load_ld() python function input: ~/echolocatoR/results/GWAS/Nalls23andMe_2019/BST1/LD/chr4_15000001_18000001"
[1] "+ LD:: Reading LD matrix into memory. This could take some time..."
~/echolocatoR/results/GWAS/Nalls23andMe_2019/BST1/LD/chr4_15000001_18000001.gz
~/echolocatoR/results/GWAS/Nalls23andMe_2019/BST1/LD/chr4_15000001_18000001.npz
Processed URL: ~/echolocatoR/results/GWAS/Nalls23andMe_2019/BST1/LD/chr4_15000001_18000001
Some other message at the end
[1] "+ Full UKB LD matrix: 20815 x 20815"
[1] "+ Full UKB LD SNP data.table: 20815 x 5"
[1] "+ LD:: Saving LD matrix ==> ~/echolocatoR/results/GWAS/Nalls23andMe_2019/BST1/LD/BST1.UKB_LD.RDS"
[1] "115 x 115 LD_matrix (sparse)"
[1] "+ FILTER:: Filtering by LD features."
[1] "FILTER:: Filtering by SNP features."
[1] "+ FILTER:: Removing SNPs with MAF < 0.001"
[1] "+ FILTER:: Post-filtered data: 115 x 16"
vvvvv-- FINEMAP --vvvvvAll required columns present.All suggested columns present.
vvvvv-- SUSIE --vvvvvAll required columns present.All suggested columns present.
[1] "++ Fine-mapping using multiple tools: FINEMAP, SUSIE"

+++ Multi-finemap:: FINEMAP +++
[1] "++ FINEMAP:: Constructing master file."
[1] "++ FINEMAP:: Constructing data.z file."
[1] "++ FINEMAP:: Constructing data.ld file."
[1] "+ Using FINEMAP v1.4"

|--------------------------------------|
| Welcome to FINEMAP v1.4              |
|                                      |
| (c) 2015-2020 University of Helsinki |
|                                      |
| Help :                               |
| - ./finemap --help                   |
| - www.finemap.me                     |
| - www.christianbenner.com            |
|                                      |
| Contact :                            |
| - christian.benner@helsinki.fi       |
| - matti.pirinen@helsinki.fi          |
|--------------------------------------|

--------
SETTINGS
--------
- dataset            : all
- corr-config        : 0.95
- n-causal-snps      : 5
- n-configs-top      : 50000
- n-conv-sss         : 100
- n-iter             : 100000
- n-threads          : 1
- prior-k0           : 0
- prior-std          : 0.05
- prob-conv-sss-tol  : 0.001
- prob-cred-set      : 0.95

------------
FINE-MAPPING (1/1)
------------
- GWAS summary stats               : FINEMAP/data.z
- SNP correlations                 : FINEMAP/data.ld
- Causal SNP stats                 : FINEMAP/data.snp
- Causal configurations            : FINEMAP/data.config
- Credible sets                    : FINEMAP/data.cred
- Log file                         : FINEMAP/data.log_sss
- Reading input                    : done!

- Number of GWAS samples           : 216621
- Number of SNPs                   : 115
- Prior-Pr(# of causal SNPs is k)  :
  (0 -> 0)
   1 -> 0.584
   2 -> 0.292
   3 -> 0.0964
   4 -> 0.0237
   5 -> 0.00461
- 1616 configurations evaluated (0.104/100%) : converged after 104 iterations
- Computing causal SNP statistics  : done!
- Regional SNP heritability        : 0.00763 (SD: 0.000422 ; 95% CI: [0.00679,0.0085])
- Log10-BF of >= one causal SNP    : 2.28e+04
- Post-expected # of causal SNPs   : 5
- Post-Pr(# of causal SNPs is k)   :
  (0 -> 0)
   1 -> 0
   2 -> 0
   3 -> 0
   4 -> 2.87e-262
   5 -> 1
- Computing credible sets          : done!
- Writing output                   : done!
- Run time                         : 0 hours, 0 minutes, 0 seconds
[1] ".cred not detected. Using .snp instead."
[1] "+ FINEMAP:: Importing prob (.snp)..."
Error in `[.data.table`(x, r, vars, with = FALSE) :
  column(s) not found: prob
[1] "++ Credible Set SNPs identified = 0"
[1] "++ Merging FINEMAP results with multi-finemap data."

+++ Multi-finemap:: SUSIE +++
[1] "+ Preparing sample_size (N) column"
[1] "+ `N` column already present in data."
[1] "+ SUSIE:: sample_size= 216621"
[1] "+ SUSIE:: max_causal = 5"
[1] "+ Subsetting LD matrix and finemap_dat to common SNPs..."
[1] "+ LD:: Removing unnamed rows/cols"
[1] "+ LD:: Replacing NAs with 0"
[1] "+ LD_matrix = 115 SNPs."
[1] "+ finemap_dat = 115 SNPs."
[1] "+ 115 SNPs in common."
[1] "+ SUSIE:: Using `susie_suff_stat()` from susieR v0.10.1"
[1] "+ SUSIE:: Extracting Credible Sets..."
[1] "++ Credible Set SNPs identified = 3"
[1] "++ Merging SUSIE results with multi-finemap data."
[1] "+ Identifying Consensus SNPs..."
[1] "++ support_thresh = 2"
[1] "++ top_CS_only=FALSE"
[1] "+ Calculating mean Posterior Probability (mean.PP)..."
[1] "+ Replacing PP==NA with 0"
[1] "++ 2 fine-mapping methods used."
[1] "++ 3 Credible Set SNPs identified."
[1] "++ 0 Consensus SNPs identified."
[1] "+ Fine-mapping with ' FINEMAP, SUSIE ' completed:"
Time difference of 3.1 secs

Fine-mapping complete in:
Time difference of 1.7 mins
[1] "+ Identifying Consensus SNPs..."
[1] "++ support_thresh = 2"
[1] "++ top_CS_only=FALSE"
[1] "+ Calculating mean Posterior Probability (mean.PP)..."
[1] "+ Replacing PP==NA with 0"
[1] "++ 2 fine-mapping methods used."
[1] "++ 3 Credible Set SNPs identified."
[1] "++ 0 Consensus SNPs identified."

> Nalls23andMe_2019.results[SUSIE.CS > 0, list(SNP, FINEMAP.CS, FINEMAP.PP, SUSIE.CS, SUSIE.PP)]
          SNP FINEMAP.CS FINEMAP.PP SUSIE.CS  SUSIE.PP
1: rs34559912         NA         NA        3 1.0000000
2:  rs4389574         NA         NA        1 1.0000000
3:  rs6852450         NA         NA        2 0.9999992

> readLines("results/GWAS/Nalls23andMe_2019/BST1/FINEMAP/data.snp", n = 3)
[1] "index rsid chromosome position allele1 allele2 maf beta se z prob log10bf mean sd mean_incl sd_incl"
[2] "1 rs6828144 4 15727389 T C 0.0263 0.1765 0.0309 5.71197 1 11.8522 1.40649 0.0355444 1.40649 0.0355444"
[3] "111 rs11947310 4 15744576 A C 0.1909 0.0823 0.012 6.85833 1 11.8522 1.40649 0.0355444 1.40649 0.0355444"

> readLines("results/GWAS/Nalls23andMe_2019/BST1/FINEMAP/data.cred5")
[1] "# Post-Pr(# of causal SNPs is 5) = 1"
[2] "#log10bf 22448.9 NA 22432.1 NA 20843.6 NA 556.379 NA 264.033 NA"
[3] "#min(|ld|) 1 NA 1 NA 1 NA 1 NA 1 NA"
[4] "#mean(|ld|) 1 NA 1 NA 1 NA 1 NA 1 NA"
[5] "#median(|ld|) 1 NA 1 NA 1 NA 1 NA 1 NA"
[6] "index cred1 prob1 cred2 prob2 cred3 prob3 cred4 prob4 cred5 prob5"
[7] "1 rs11947310 1 rs11933202 1 rs1807250 1 rs6828144 1 rs73123615 0.999999"

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: ~/mambaforge/envs/echoR/lib/libopenblasp-r0.3.12.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] echolocatoR_0.1.2

loaded via a namespace (and not attached):
  [1] colorspace_2.0-0            ellipsis_0.3.1
  [3] class_7.3-18                biovizBase_1.38.0
  [5] htmlTable_2.1.0             XVector_0.30.0
  [7] GenomicRanges_1.42.0        base64enc_0.1-3
  [9] gld_2.6.2                   dichromat_2.0-0
 [11] rstudioapi_0.13             proxy_0.4-25
 [13] DT_0.17                     bit64_4.0.5
 [15] AnnotationDbi_1.52.0        fansi_0.4.2
 [17] mvtnorm_1.1-1               xml2_1.3.2
 [19] R.methodsS3_1.8.1           splines_4.0.3
 [21] ggbio_1.38.0                cachem_1.0.4
 [23] rootSolve_1.8.2.1           knitr_1.31
 [25] jsonlite_1.7.2              Formula_1.2-4
 [27] Rsamtools_2.6.0             dbplyr_2.1.1
 [29] cluster_2.1.1               R.oo_1.24.0
 [31] png_0.1-7                   graph_1.68.0
 [33] BiocManager_1.30.12         compiler_4.0.3
 [35] httr_1.4.2                  backports_1.2.1
 [37] assertthat_0.2.1            Matrix_1.3-2
 [39] fastmap_1.1.0               lazyeval_0.2.2
 [41] htmltools_0.5.1.1           prettyunits_1.1.1
 [43] tools_4.0.3                 gtable_0.3.0
 [45] glue_1.4.2                  lmom_2.8
 [47] GenomeInfoDbData_1.2.4      reshape2_1.4.4
 [49] dplyr_1.0.5                 rappdirs_0.3.3
 [51] Rcpp_1.0.6                  Biobase_2.50.0
 [53] vctrs_0.3.7                 Biostrings_2.58.0
 [55] rtracklayer_1.50.0          crosstalk_1.1.1
 [57] xfun_0.20                   stringr_1.4.0
 [59] lifecycle_1.0.0             ensembldb_2.14.0
 [61] XML_3.99-0.6                zlibbioc_1.36.0
 [63] MASS_7.3-53.1               scales_1.1.1
 [65] BSgenome_1.58.0             VariantAnnotation_1.36.0
 [67] ProtGenerics_1.22.0         hms_1.0.0
 [69] MatrixGenerics_1.2.1        RBGL_1.66.0
 [71] parallel_4.0.3              SummarizedExperiment_1.20.0
 [73] expm_0.999-6                susieR_0.10.1
 [75] AnnotationFilter_1.14.0     RColorBrewer_1.1-2
 [77] curl_4.3                    Exact_2.1
 [79] reticulate_1.18             memoise_2.0.0
 [81] gridExtra_2.3               ggplot2_3.3.3
 [83] biomaRt_2.46.3              rpart_4.1-15
 [85] reshape_0.8.8               latticeExtra_0.6-29
 [87] stringi_1.5.3               RSQLite_2.2.5
 [89] S4Vectors_0.28.1            e1071_1.7-6
 [91] checkmate_2.0.0             GenomicFeatures_1.42.2
 [93] BiocGenerics_0.36.0         boot_1.3-27
 [95] BiocParallel_1.24.1         GenomeInfoDb_1.26.4
 [97] rlang_0.4.10                pkgconfig_2.0.3
 [99] matrixStats_0.58.0          bitops_1.0-6
[101] lattice_0.20-41             purrr_0.3.4
[103] GenomicAlignments_1.26.0    htmlwidgets_1.5.3
[105] bit_4.0.4                   tidyselect_1.1.0
[107] GGally_2.1.1                plyr_1.8.6
[109] magrittr_2.0.1              R6_2.5.0
[111] IRanges_2.24.1              DescTools_0.99.40
[113] generics_0.1.0              Hmisc_4.5-0
[115] DelayedArray_0.16.3         DBI_1.1.1
[117] pillar_1.5.1                foreign_0.8-81
[119] survival_3.2-10             RCurl_1.98-1.3
[121] nnet_7.3-15                 tibble_3.1.0
[123] crayon_1.4.1                utf8_1.2.1
[125] OrganismDbi_1.32.0          BiocFileCache_1.14.0
[127] jpeg_0.1-8.1                progress_1.2.2
[129] grid_4.0.3                  data.table_1.14.0
[131] blob_1.2.1                  digest_0.6.27
[133] R.utils_2.10.1              openssl_1.4.3
[135] stats4_4.0.3                munsell_0.5.0
[137] askpass_1.1
Warning message:
In susie_func(bhat = subset_DT$Effect, shat = subset_DT$StdErr,  :
  IBSS algorithm did not converge in 100 iterations!

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions