CRAN 7.2 submission#61
Merged
Merged
Conversation
…t/BIGr into ped_indels_update
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Madc2vcf updates
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Validate Ped trios with low markers will still be flagged but now will show recommendations When no parent pair passes the threshold of errors, they will still be shown in the final report Find Parents Fixed formatting of final output when ties on best,pair were found. Implemented vectorization and improvements on efficiency. When two recommendations are tied on error %, the tiebreaker is the number of markers tested. the option with the highest # of markers testes takes priority.
Madc2vcf and Pedigree functions updates
Validate Ped trios with low markers will still be flagged but now will show recommendations When no parent pair passes the threshold of errors, they will still be shown in the final report Find Parents Fixed formatting of final output when ties on best,pair were found. Implemented vectorization and improvements on efficiency. When two recommendations are tied on error %, the tiebreaker is the number of markers tested. the option with the highest # of markers testes takes priority.
Dosage2vcf update
Contributor
There was a problem hiding this comment.
Pull request overview
This PR prepares BIGr 0.7.2 for CRAN with new MADC validation/conversion workflows, new pedigree/parentage utilities, expanded DArT report-to-VCF support, updated documentation, tests, and CI setup.
Changes:
- Adds
madc2vcf_multi,check_madc_sanity,find_parentage, andvalidate_pedigree. - Expands
dosage2vcf,check_ped,get_countsMADC, andimputation_concordance. - Updates package metadata, generated Rd docs, tests, NEWS, NAMESPACE, and GitHub Actions.
Reviewed changes
Copilot reviewed 32 out of 42 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/R-CMD-check.yaml |
Updates CI dependencies and CRAN-like behavior. |
.gitignore |
Adds macOS artifact ignore. |
BIGr.Rproj |
Adds RStudio project ID. |
CRAN-SUBMISSION |
Updates CRAN submission metadata. |
DESCRIPTION |
Bumps version and dependencies. |
NAMESPACE |
Exports/imports new functions and dependencies. |
NEWS.md |
Documents release changes. |
R/check_madc_sanity.R |
Adds MADC sanity checks and botloci remapping. |
R/check_ped.R |
Refactors pedigree checking output. |
R/dosage2vcf.R |
Adds SNP/INDEL report parsing and alignment validation. |
R/find_parentage.R |
Adds parentage assignment utility. |
R/get_countsMADC.R |
Adds object input and match-count handling. |
R/imputation_concordance.R |
Adds plotting/printing controls and doc updates. |
R/madc2vcf_multi.R |
Adds polyRAD-based MADC-to-VCF conversion. |
R/thinSNP.R |
Narrows roxygen imports. |
R/utils.R |
Adds verbose/url helpers and global variables. |
R/validate_pedigree.R |
Adds Mendelian trio validation utility. |
cran-comments.md |
Updates CRAN notes. |
dev/dev_history.R |
Comments dependency-amend command. |
man/*.Rd |
Updates/generated documentation for changed APIs. |
tests/testthat/* |
Adds and updates test coverage for new/changed behavior. |
Files not reviewed (9)
- man/check_madc_sanity.Rd: Language not supported
- man/check_ped.Rd: Language not supported
- man/dosage2vcf.Rd: Language not supported
- man/find_parentage.Rd: Language not supported
- man/get_countsMADC.Rd: Language not supported
- man/imputation_concordance.Rd: Language not supported
- man/madc2vcf_all.Rd: Language not supported
- man/madc2vcf_multi.Rd: Language not supported
- man/madc2vcf_targets.Rd: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| # Print mean concordance | ||
| # Summary statistics | ||
| summary_concordance <- summary(percentage_match, na.rm = TRUE) * 100 |
| if (comparisons == 0) return(NA_real_) | ||
| (base::sum(cand_hom != prog_hom, na.rm = TRUE) / comparisons) * 100 | ||
| }) | ||
|
|
| if(is.null(mi_df$Chr) | is.null(mi_df$Pos)) stop("When MADC CloneID don't follow the format Chr_Pos, Chr and Pos columns must be provided in the markers_info file.") | ||
| } | ||
|
|
||
| if(!any(botloci$V1 %in% report$CloneID)) { # First check if any botloci markers are found in MADC file. If not, check for padding mismatch. |
| Progeny = progeny_ids, | ||
| Male_Parent = NA_character_, | ||
| Female_Parent = NA_character_, | ||
| Mendelian_Error_Pct = NA_character_, |
| if (!(file.exists(madc_file) | url_exists(madc_file))) stop("MADC file not found. Please provide a valid path or URL.") | ||
| if (!(file.exists(botloci_file) | url_exists(botloci_file))) stop("Botloci file not found. Please provide a valid path or URL.") | ||
| if (!is.null(markers_info) && !(file.exists(markers_info) | url_exists(markers_info))) stop("markers_info file not found. Please provide a valid path or URL.") | ||
| if (!is.numeric(ploidy) || ploidy < 1) stop("ploidy must be a positive integer.") |
| if(!any(is.na(report$CloneID))) { | ||
| pos <- strsplit(report$CloneID, "_") | ||
| format <- all(sapply(pos, length) == 2) | ||
| first <- all(grepl("^[A-Za-z]", sapply(pos, "[", 1))) |
Comment on lines
+1
to
+4
| ID Male_Parent Female_Parent | ||
| IND_C IND_A IND_B | ||
| IND_D 0 IND_A | ||
| GHOST IND_A IND_B |
| potato_markers_info_ChromPos <- paste0(github_path, "test_madcs/potato_marker_info_chrompos.csv") # markers_info: CloneID/BI_markerID, Chr, Pos | ||
| potato_microhapDB <- paste0(github_path, "potato/potato_allele_db_v001.fa") | ||
|
|
||
| skip_if_offline("raw.githubusercontent.com") |
Comment on lines
+2
to
+4
| skip_if_offline("raw.githubusercontent.com") | ||
|
|
||
| github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/test_madcs/" |
Comment on lines
+181
to
+187
| input_ped_report <- list( | ||
| exact_duplicates = exact_duplicates, | ||
| repeated_ids_diff = repeated_ids_report, | ||
| messy_parents = messy_parents, | ||
| missing_parents = missing_parents, | ||
| dependencies = data.frame(Dependency = unique(unlist(errors))), | ||
| corrected_pedigree = data |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #61 +/- ##
==========================================
- Coverage 83.35% 82.02% -1.34%
==========================================
Files 19 23 +4
Lines 1358 2876 +1518
==========================================
+ Hits 1132 2359 +1227
- Misses 226 517 +291 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces BIGr version 0.7.2, bringing significant feature enhancements, dependency updates, and workflow improvements. The most notable changes are the addition of the new
madc2vcf_multifunction for polyRAD-based multiallelic genotyping, expanded support and validation in VCF conversion utilities, and updates to package dependencies and GitHub Actions workflows to support these features.Major feature additions and improvements:
madc2vcf_multifor converting DArTag MADC files to VCF using the polyRAD pipeline, with comprehensive input validation and support for multiallelic genotyping. The function handles CloneID mapping, overdispersion estimation, and robust error messaging, and introduces the optionalmarkers_infoargument for flexible marker annotation. (NEWS.md,NAMESPACE,DESCRIPTION) [1] [2] [3]dosage2vcfto support DArT SNP/INDEL 1-row and 2-row formats, improved marker/sample alignment, and refined missing genotype handling. (NEWS.md)madc2vcf_alland related functions: added arguments for controlling "Other" allele processing, improved error and debug reporting, and fixed bugs related to allele handling and VCF field corruption. (NEWS.md)Dependency and workflow updates:
polyRADanddata.tableto package dependencies (SuggestsandImports), and updated the GitHub Actions workflow to installpolyRADandVariantAnnotationfor CI. (DESCRIPTION,.github/workflows/R-CMD-check.yaml) [1] [2]data.table,dplyr, and other packages to support new features. (NAMESPACE) [1] [2]Documentation and metadata:
NEWS.mdfile with detailed descriptions of new features, bug fixes, and usage notes for all recent versions. (NEWS.md) [1] [2]DESCRIPTION,CRAN-SUBMISSION, andBIGr.Rproj. Also updated author affiliations and Roxygen version. (DESCRIPTION,CRAN-SUBMISSION,BIGr.Rproj) [1] [2] [3] [4] [5]Continuous integration improvements:
NOT_CRAN: truein the CI environment to enable tests that are skipped on CRAN, ensuring more thorough checks on GitHub Actions. (.github/workflows/R-CMD-check.yaml)These updates collectively improve the robustness, flexibility, and usability of the BIGr package for polyploid and diploid genomics workflows.