diff --git a/DESCRIPTION b/DESCRIPTION index 11e9dc5..e1c91b8 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: Qploidy2 Title: Estimation of Large-Scale Copy Number Variation Using Genotyping Data -Version: 1.9.0 +Version: 1.9.1 Authors@R: c(person(given='Cristiane', family='Taniguti', email = 'ctaniguti@ufl.edu', diff --git a/NEWS.md b/NEWS.md index 8440650..4e7c1cb 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,7 @@ +# Qploidy2 1.9.1 + +* Add `GenoBrew` links + # Qploidy2 1.9.0 * Fork from Cristianetaniguti/Qploidy@development diff --git a/README.md b/README.md index 4ab98e0..497997a 100644 --- a/README.md +++ b/README.md @@ -29,6 +29,18 @@ devtools::install_github("Breeding-Insight/Qploidy2") * [Tutorial](https://Breeding-Insight.github.io/Qploidy2/Qploidy_alfalfa_tutorial.html) - large-scale copy number estimation in alfalfa mapping population using DArTag sequencing data +## Interactive visualization and curation of results + +The resulted files from Qploidy2 `write_hmm_CN` function can be uploaded on `GenoBrew` Shiny interface. +The tab `CNV profiles` allow you to interactively inspect the BAF, z-score and CN calls plots, +re-run the HMM adjusting parameters for specific samples and update the results. + +Visit `GenoBrew` GitHub [repository](https://github.com/Breeding-Insight/GenoBrew) +and [tutorial](https://scribehow.com/viewer/GenoBrew_Interactive_Marker_Panel_Evaluation_CNV_Visualization_and_Curation__4uWloBuPT1WlnCvW2UWiTg) +for more information. + + + ## Contributing Contributions are welcome! If you'd like to contribute, please fork the repository and submit a pull request to the development branch. For major changes, open an issue first to discuss your ideas. diff --git a/app.R b/app.R deleted file mode 100644 index 8a24b60..0000000 --- a/app.R +++ /dev/null @@ -1,7 +0,0 @@ -# Launch the ShinyApp (Do not remove this comment) -# To deploy, run: rsconnect::deployApp() -# Or use the blue button on top of this file - -pkgload::load_all(export_all = FALSE,helpers = FALSE,attach_testthat = FALSE) -options( "golem.app.prod" = TRUE, shiny.autoload.r = FALSE) -Qploidy2::run_app() # add parameters here (if any) diff --git a/docs/Qploidy_alfalfa_tutorial.Rmd b/docs/Qploidy_alfalfa_tutorial.Rmd index 40b3158..c8b6d93 100644 --- a/docs/Qploidy_alfalfa_tutorial.Rmd +++ b/docs/Qploidy_alfalfa_tutorial.Rmd @@ -410,14 +410,26 @@ p$cn p$baf_dist ``` +See the results in a `data.frame` format: + +```{r} +head(res$by_window) +head(res$by_marker) +head(res$params) +``` + Summarize results: ```{r} +# By chromosome summ_hmm_chr <- summarize_cn_mode(res, level = "chromosome") +summ_hmm_chr + +# By sample summ_hmm_sample <- summarize_cn_mode(res, level = "sample") +summ_hmm_sample ``` - # HMM Copy Number Estimation (Multiple Samples) Run HMM for all samples in parallel. The model will be selected internally using `select_best_baf_model`; you just need to define the grids (`bw`, `add_uniform_grid`, and `uniform_weight_grid`) according to what you observed when running it on a single sample. @@ -428,12 +440,12 @@ Run HMM for all samples in parallel. The model will be selected internally using ```{r} multi_esti_raw <- hmm_estimate_CN_multi(data = data, - geno.pos = geno.pos, - use_values = c("ratio","R"), - sample_ids = "all", - n_cores = 2, - cn_grid = c(2,3,4,5,6), - bw = c(0.02, 0.04,0.1)) + geno.pos = geno.pos, + use_values = c("ratio","R"), + sample_ids = "all", + n_cores = 2, + cn_grid = c(2,3,4,5,6), + bw = c(0.02, 0.04,0.1)) ``` Visualize CNV tracks for multiple samples: @@ -569,6 +581,8 @@ p$arranged Explore more parameters consulting `?hmm_estimate_CN`. +# Refine results for a single sample + Once you get to a conclusion with results better explain the patterns, you can update your multi-sample HMM object by using the functions: @@ -581,6 +595,48 @@ final_res <- update_hmm_CN_multi(multi_esti, If you decide that the results for this sample is inconclusive, you can set the `rm_sample` parameter to `TRUE` to remove it from your final results. +# Export HMM results + +The resulted objects from function `update_hmm_CN_multi`, +`hmm_estimate_CN`, and `hmm_estimate_CN_multi` are all classified as + +```{r} +class(res_tweak) # result from hmm_estimate_CN +class(multi_esti) # result from hmm_estimate_CN_multi +class(re_multi_esti) # result from hmm_estimate_CN_multi after re-standardization +class(final_res) # result from update_hmm_CN_multi +``` + +Therefore, all of them can be exported by the function: + +```{r} +write_hmm_CN(final_res, prefix = "alfalfa_example") +``` + +You will see three generated files: + +- "[prefix]_by_marker.csv.gz" +- "[prefix]_by_window.csv.gz" +- "[prefix]_params.rds" + +You can read them back to R using the `read_hmm_CN` function: + +```{r, eval=FALSE} +final_res <- read_hmm_CN(by_window_file = "alfalfa_example_by_window.csv.gz", + by_marker_file = "alfalfa_example_by_marker.csv.gz", + params_file = "alfalfa_example_params.rds") +``` + +# Interactive curation of results + +The resulted files from `write_hmm_CN` can be uploaded on `GenoBrew` Shiny interface. +The tab `CNV profiles` allow you to interactively inspect the BAF, z-score and CN calls plots, +re-run the HMM adjusting parameters for specific samples and update the results. + +Visit `GenoBrew` GitHub [repository](https://github.com/Breeding-Insight/GenoBrew) +and [tutorial](https://scribehow.com/viewer/GenoBrew_Interactive_Marker_Panel_Evaluation_CNV_Visualization_and_Curation__4uWloBuPT1WlnCvW2UWiTg) +for more information. + # Call dosages **Warning**: Beta version - Dosage call requires more testing and review. @@ -658,11 +714,11 @@ Klein, P. E., & Riera-Lizarazu, O. (2025). Exploring chromosomal variations in g from high-density SNP array data and a new tool, Qploidy. The Plant Genome, e70044. https://doi.org/10.1002/tpg2.70044 -### Qploidy2 - HMM and BAF model selection +### Qploidy2 - HMM and grid approach BAF model selection Manuscript in preparation. Please contact the author for more information. -### nQuack - BAF model selection +### nQuack - EM approach BAF model selection Gaynor, M., Landis, J., O'Connor, T., Laport, R., Doyle, J., Soltis, D., Ponciano, J., & Soltis, P. (2024). "nQuack: An R package for predicting ploidal level from sequence data using site-based heterozygosity." diff --git a/docs/Qploidy_alfalfa_tutorial.html b/docs/Qploidy_alfalfa_tutorial.html index dafb57a..4edd756 100644 --- a/docs/Qploidy_alfalfa_tutorial.html +++ b/docs/Qploidy_alfalfa_tutorial.html @@ -5527,7 +5527,7 @@
This is an interactive graphic; hover your mouse over the cells to get information on sample name, heterozygosity percentage, and the number of markers used for this estimation. Heterozygosity is an @@ -5818,9 +5818,59 @@
See the results in a data.frame format:
## $cn_grid
+## [1] 2 3 4 6
+##
+## $distribution
+## [1] "beta"
+##
+## $mu
+## 2 3 4 6
+## 0.1731491 0.2595389 0.2793871 0.3083124
+##
+## $sigma
+## [1] 0.1772968
+##
+## $A
+## [,1] [,2] [,3] [,4]
+## [1,] 4.022641e-04 1.036262e-04 0.9932692 6.224879e-03
+## [2,] 9.576942e-07 1.777481e-01 0.8189540 3.296986e-03
+## [3,] 1.018569e-09 1.850962e-07 0.9999986 1.226122e-06
+## [4,] 6.925351e-07 2.223240e-04 0.9379386 6.183840e-02
+##
+## $pi0
+## [1] 6.530653e-11 2.101968e-09 9.999999e-01 8.815585e-08
Summarize results:
-summ_hmm_chr <- summarize_cn_mode(res, level = "chromosome")
-summ_hmm_sample <- summarize_cn_mode(res, level = "sample")multi_esti_raw <- hmm_estimate_CN_multi(data = data,
- geno.pos = geno.pos,
- use_values = c("ratio","R"),
- sample_ids = "all",
- n_cores = 2,
- cn_grid = c(2,3,4,5,6),
- bw = c(0.02, 0.04,0.1))multi_esti_raw <- hmm_estimate_CN_multi(data = data,
+ geno.pos = geno.pos,
+ use_values = c("ratio","R"),
+ sample_ids = "all",
+ n_cores = 2,
+ cn_grid = c(2,3,4,5,6),
+ bw = c(0.02, 0.04,0.1))Visualize CNV tracks for multiple samples:
- +Again, not the best results for this dataset. Standardization is required.
@@ -5855,24 +5905,24 @@multi_esti <- hmm_estimate_CN_multi(qploidy_standarize_result = qploidy_standardization,
- sample_ids = "all",
- n_cores = 2,
- cn_grid = c(2,3,4,5,6),
- bw = c(0.02, 0.04,0.1))multi_esti <- hmm_estimate_CN_multi(qploidy_standarize_result = qploidy_standardization,
+ sample_ids = "all",
+ n_cores = 2,
+ cn_grid = c(2,3,4,5,6),
+ bw = c(0.02, 0.04,0.1))Visualize CNV tracks for multiple samples:
- +# For a specific sample
-p <- plot_cn_track(hmm_CN = multi_esti,
- qploidy_standarize_result = qploidy_standardization,
- sample_id = "85-182",
- show_window_lines = TRUE)
-p$arranged# For a specific sample
+p <- plot_cn_track(hmm_CN = multi_esti,
+ qploidy_standarize_result = qploidy_standardization,
+ sample_id = "85-182",
+ show_window_lines = TRUE)
+p$arrangedIf you want to improve BAF resolution (e.g., after identifying a subset of samples with known ploidy via HMM), re-standardize:
-re_qploidy_standardization <- re_standardize(
- data = data,
- geno.pos = geno.pos,
- hmm_CN_multi = multi_esti,
- selected_model = selected_model,
- threshold.geno.prob = 0.9,
- ploidy.standardization = 4,
- threshold.n.clusters = 4,
- n.cores = 2,
- out_filename = "re_standardization.tsv.gz",
- verbose = TRUE
-)
-
-re_qploidy_standardizationre_qploidy_standardization <- re_standardize(
+ data = data,
+ geno.pos = geno.pos,
+ hmm_CN_multi = multi_esti,
+ selected_model = selected_model,
+ threshold.geno.prob = 0.9,
+ ploidy.standardization = 4,
+ threshold.n.clusters = 4,
+ n.cores = 2,
+ out_filename = "re_standardization.tsv.gz",
+ verbose = TRUE
+)
+
+re_qploidy_standardization## This is an object of class 'qploidy_standardization'
## --------------------------------------------------------------------
## Parameters
@@ -5913,51 +5963,51 @@ 11 Re-standardization
## 4 # markers filtered by min number of clusters: 23 (0.83 %)
## 5 # markers filtered by lack of genomic information: 0 (0 %)
## 6 # markers remaining with estimated BAF: 889 (32.14 %)
-# Compare before and after
-plot_qploidy_standardization(x = qploidy_standardization,
- sample = "85-295",
- type = c("Ratio_hist_overall", "BAF_hist_overall"))# Compare before and after
+plot_qploidy_standardization(x = qploidy_standardization,
+ sample = "85-295",
+ type = c("Ratio_hist_overall", "BAF_hist_overall"))plot_qploidy_standardization(x = re_qploidy_standardization,
- sample = "85-295",
- type = c("Ratio_hist_overall", "BAF_hist_overall"))plot_qploidy_standardization(x = re_qploidy_standardization,
+ sample = "85-295",
+ type = c("Ratio_hist_overall", "BAF_hist_overall"))re_multi_esti <- hmm_estimate_CN_multi(
- qploidy_standarize_result = re_qploidy_standardization,
- sample_ids = "all",
- n_cores = 2,
- cn_grid = c(2,3,4,5,6),
- bw = c(0.02, 0.04,0.1),
- add_uniform_grid = TRUE,
- uniform_weight_grid = c(0.2, 0.5,0.95))
-
-# CNV profiles before
-compare_cn_track(hmm_CN = multi_esti,
- samples_to_plot = samples[91:110],
- facet_ncol = 8)re_multi_esti <- hmm_estimate_CN_multi(
+ qploidy_standarize_result = re_qploidy_standardization,
+ sample_ids = "all",
+ n_cores = 2,
+ cn_grid = c(2,3,4,5,6),
+ bw = c(0.02, 0.04,0.1),
+ add_uniform_grid = TRUE,
+ uniform_weight_grid = c(0.2, 0.5,0.95))
+
+# CNV profiles before
+compare_cn_track(hmm_CN = multi_esti,
+ samples_to_plot = samples[91:110],
+ facet_ncol = 8)# CNV profiles after
-compare_cn_track(hmm_CN = re_multi_esti,
- samples_to_plot = samples[91:110],
- facet_ncol = 8)# CNV profiles after
+compare_cn_track(hmm_CN = re_multi_esti,
+ samples_to_plot = samples[91:110],
+ facet_ncol = 8)# Same plots can be generated
-p_re <- plot_cn_track(hmm_CN = re_multi_esti,
- qploidy_standarize_result = re_qploidy_standardization,
- sample_id = "85-295",
- show_window_lines = TRUE)
-
-p <- plot_cn_track(hmm_CN = multi_esti,
- qploidy_standarize_result = qploidy_standardization,
- sample_id = "85-295",
- show_window_lines = TRUE)
-
-p$arranged #before# Same plots can be generated
+p_re <- plot_cn_track(hmm_CN = re_multi_esti,
+ qploidy_standarize_result = re_qploidy_standardization,
+ sample_id = "85-295",
+ show_window_lines = TRUE)
+
+p <- plot_cn_track(hmm_CN = multi_esti,
+ qploidy_standarize_result = qploidy_standardization,
+ sample_id = "85-295",
+ show_window_lines = TRUE)
+
+p$arranged #beforeFor this dataset, there are a few changes for some samples like the one showed above. Chromosome 7 is pointed with higher number (5) of @@ -5967,72 +6017,116 @@
res_tweak <- hmm_estimate_CN(qploidy_standarize_result = re_qploidy_standardization,
- selected_model = selected_model,
- sample_id = "85-295",
- min_snps_per_window = 60,
- cn_grid = c(2,3,4,5,6))
-
-p <- plot_cn_track(hmm_CN = res_tweak,
- qploidy_standarize_result = re_qploidy_standardization,
- sample_id = "85-295",
- show_window_lines = TRUE)
-
-p$arrangedres_tweak <- hmm_estimate_CN(qploidy_standarize_result = re_qploidy_standardization,
+ selected_model = selected_model,
+ sample_id = "85-295",
+ min_snps_per_window = 60,
+ cn_grid = c(2,3,4,5,6))
+
+p <- plot_cn_track(hmm_CN = res_tweak,
+ qploidy_standarize_result = re_qploidy_standardization,
+ sample_id = "85-295",
+ show_window_lines = TRUE)
+
+p$arrangedExplore more parameters consulting ?hmm_estimate_CN.
Once you get to a conclusion with results better explain the patterns, you can update your multi-sample HMM object by using the functions:
- +If you decide that the results for this sample is inconclusive, you
can set the rm_sample parameter to TRUE to
remove it from your final results.
The resulted objects from function update_hmm_CN_multi,
+hmm_estimate_CN, and hmm_estimate_CN_multi are
+all classified as
## [1] "hmm_CN"
+
+## [1] "hmm_CN"
+
+## [1] "hmm_CN"
+
+## [1] "hmm_CN"
+Therefore, all of them can be exported by the function:
+ +You will see three generated files:
+You can read them back to R using the read_hmm_CN
+function:
The resulted files from write_hmm_CN can be uploaded on
+GenoBrew Shiny interface. The tab CNV profiles
+allow you to interactively inspect the BAF, z-score and CN calls plots,
+re-run the HMM adjusting parameters for specific samples and update the
+results.
Visit GenoBrew GitHub repository and
+tutorial
+for more information.
Warning: Beta version - Dosage call requires more testing and review.
Qploidy2 uses the BAF distributions to call dosages. See an example:
-# res was generated by hmm_estimate_CN above, it contains only one sample
-sample_name <- "85-17"
-baf <- final_res$by_marker$baf[final_res$by_marker$SampleName == sample_name]
-
-selected_model <- select_best_baf_model(baf_vec = baf,
- sample = sample_name,
- cn_grid= c(2,3,4),
- bw = c(0.02, 0.04,0.1),
- plot = TRUE)
-
-dosages_one_sample <- call_BAF_dosages(baf, selected_model = selected_model, plot=TRUE)
-head(dosages_one_sample$data)# res was generated by hmm_estimate_CN above, it contains only one sample
+sample_name <- "85-17"
+baf <- final_res$by_marker$baf[final_res$by_marker$SampleName == sample_name]
+
+selected_model <- select_best_baf_model(baf_vec = baf,
+ sample = sample_name,
+ cn_grid= c(2,3,4),
+ bw = c(0.02, 0.04,0.1),
+ plot = TRUE)
+
+dosages_one_sample <- call_BAF_dosages(baf, selected_model = selected_model, plot=TRUE)
+head(dosages_one_sample$data)Call dosages for all samples considering the copy number estimated:
- +You can read the VCF with vcfR and extract fields:
- +## Scanning file to determine attributes.
## File attributes:
## meta lines: 9
@@ -6050,7 +6144,7 @@ 13 Export VCF
## row_num: 0
## Processed variant 1000Processed variant 2000Processed variant: 2764
## All variants processed
-
+
## FORMAT 85-1
## [1,] "GT:CN:AD:BAF:Z:PMC:PMD" "./.:4:1,339:.: 1.20:1.000:."
## [2,] "GT:CN:AD:BAF:Z:PMC:PMD" "0/0/1/1:4:3,2:0.42: 0.04:1.000:0.900"
@@ -6075,24 +6169,24 @@ 13 Export VCF
## [3,] "./.:4:14,0:.:-1.36:1.000:."
## [4,] "0/0/1/1:4:33,56:0.61: 0.36:1.000:0.688"
## [5,] "./.:4:132,48:.:-0.17:1.000:."
-
+
## 85-1 85-100 85-102 85-103 85-104
## chr1.1_000194324 "4" "4" "4" "4" "4"
## chr1.1_000452961 "4" "4" "4" "4" "4"
## chr1.1_000532584 "4" "4" "4" "4" "4"
## chr1.1_000735393 "4" "4" "4" "4" "4"
## chr1.1_000837330 "4" "4" "4" "4" "4"
-
+
## 85-1 85-100 85-102 85-103 85-104
## chr1.1_000194324 NA NA NA NA NA
## chr1.1_000452961 "0/0/1/1" "0/0/0/0" "0/0/0/0" "0/1/1/1" "0/0/1/1"
## chr1.1_000532584 NA NA NA NA NA
## chr1.1_000735393 "0/1/1/1" "0/0/1/1" "0/0/1/1" "0/0/1/1" "0/1/1/1"
## chr1.1_000837330 NA NA NA NA NA
-
+
## 85-1 85-100 85-102 85-103 85-104
## chr1.1_000194324 NA NA NA NA NA
## chr1.1_000452961 "0.900" "1.000" "1.000" "0.999" "0.997"
@@ -6100,22 +6194,22 @@ 13 Export VCF
## chr1.1_000735393 "1.000" "0.981" "0.997" "0.688" "0.999"
## chr1.1_000837330 NA NA NA NA NA
Generate CNV plots for all samples in batches:
-n_plots_by_page <- c(seq(1, length(samples), by = 40), length(samples))
-for(i in 1:(length(n_plots_by_page) -1)){
- comp_p <- compare_cn_track(hmm_CN = final_res,
- samples_to_plot = samples[n_plots_by_page[[i]]:n_plots_by_page[i+1]],
- facet_ncol = 12)
- ggsave(comp_p, filename = paste0("Samples_", n_plots_by_page[i], "_", n_plots_by_page[i +1], ".png"), height = 10, width = 11)
-}n_plots_by_page <- c(seq(1, length(samples), by = 40), length(samples))
+for(i in 1:(length(n_plots_by_page) -1)){
+ comp_p <- compare_cn_track(hmm_CN = final_res,
+ samples_to_plot = samples[n_plots_by_page[[i]]:n_plots_by_page[i+1]],
+ facet_ncol = 12)
+ ggsave(comp_p, filename = paste0("Samples_", n_plots_by_page[i], "_", n_plots_by_page[i +1], ".png"), height = 10, width = 11)
+}Taniguti, C. H., Lau, J., Hochhaus, T., Arias, D. C. L., Hokanson, S. C., Zlesak, D. C., Byrne, D. H., Klein, P. E., & Riera-Lizarazu, O. @@ -6123,17 +6217,25 @@
Manuscript in preparation. Please contact the author for more information.
Gaynor, M., Landis, J., O’Connor, T., Laport, R., Doyle, J., Soltis, +D., Ponciano, J., & Soltis, P. (2024). “nQuack: An R package for +predicting ploidal level from sequence data using site-based +heterozygosity.” Applications in Plant Sciences, 12(4), e11606. doi:10.1002/aps3.11606
+Funded in part by the Robert E. Basye Endowment in Rose Genetics, Dept. of Horticultural Sciences, Texas A&M University, and USDA’s @@ -6145,8 +6247,8 @@
Supported by Breeding Insight.
diff --git a/inst/app/www/GenoBrew_demo.gif b/inst/app/www/GenoBrew_demo.gif new file mode 100644 index 0000000..db88b37 Binary files /dev/null and b/inst/app/www/GenoBrew_demo.gif differ diff --git a/inst/app/www/Qploidy2_logo.png b/inst/app/www/Qploidy2_logo.png index 9607a83..321c334 100644 Binary files a/inst/app/www/Qploidy2_logo.png and b/inst/app/www/Qploidy2_logo.png differ diff --git a/inst/logo.svg b/inst/logo.svg index 9e5494b..3cc82c5 100644 --- a/inst/logo.svg +++ b/inst/logo.svg @@ -22,6 +22,18 @@ xmlns:dc="http://purl.org/dc/elements/1.1/">