Skip to content

CRAN 7.2 submission#61

Merged
alex-sandercock merged 88 commits into
mainfrom
development
May 18, 2026
Merged

CRAN 7.2 submission#61
alex-sandercock merged 88 commits into
mainfrom
development

Conversation

@alex-sandercock
Copy link
Copy Markdown
Collaborator

This pull request introduces BIGr version 0.7.2, bringing significant feature enhancements, dependency updates, and workflow improvements. The most notable changes are the addition of the new madc2vcf_multi function for polyRAD-based multiallelic genotyping, expanded support and validation in VCF conversion utilities, and updates to package dependencies and GitHub Actions workflows to support these features.

Major feature additions and improvements:

  • Added new function madc2vcf_multi for converting DArTag MADC files to VCF using the polyRAD pipeline, with comprehensive input validation and support for multiallelic genotyping. The function handles CloneID mapping, overdispersion estimation, and robust error messaging, and introduces the optional markers_info argument for flexible marker annotation. (NEWS.md, NAMESPACE, DESCRIPTION) [1] [2] [3]
  • Enhanced dosage2vcf to support DArT SNP/INDEL 1-row and 2-row formats, improved marker/sample alignment, and refined missing genotype handling. (NEWS.md)
  • Improved madc2vcf_all and related functions: added arguments for controlling "Other" allele processing, improved error and debug reporting, and fixed bugs related to allele handling and VCF field corruption. (NEWS.md)

Dependency and workflow updates:

  • Added polyRAD and data.table to package dependencies (Suggests and Imports), and updated the GitHub Actions workflow to install polyRAD and VariantAnnotation for CI. (DESCRIPTION, .github/workflows/R-CMD-check.yaml) [1] [2]
  • Updated NAMESPACE to export new functions and import additional functions from data.table, dplyr, and other packages to support new features. (NAMESPACE) [1] [2]

Documentation and metadata:

  • Updated the NEWS.md file with detailed descriptions of new features, bug fixes, and usage notes for all recent versions. (NEWS.md) [1] [2]
  • Updated version numbers and metadata in DESCRIPTION, CRAN-SUBMISSION, and BIGr.Rproj. Also updated author affiliations and Roxygen version. (DESCRIPTION, CRAN-SUBMISSION, BIGr.Rproj) [1] [2] [3] [4] [5]

Continuous integration improvements:

  • Set NOT_CRAN: true in the CI environment to enable tests that are skipped on CRAN, ensuring more thorough checks on GitHub Actions. (.github/workflows/R-CMD-check.yaml)

These updates collectively improve the robustness, flexibility, and usability of the BIGr package for polyploid and diploid genomics workflows.

Cristianetaniguti and others added 30 commits October 3, 2025 15:11
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
alex-sandercock and others added 23 commits April 20, 2026 11:29
Validate Ped
trios with low markers will still be flagged but now will show recommendations

When no parent pair passes the threshold of errors, they will still be shown in the final report

Find Parents

Fixed formatting of final output when ties on best,pair were found.
Implemented vectorization and improvements on efficiency.
When two recommendations are tied on error %, the tiebreaker is the number of markers tested. the option with the highest # of markers testes takes priority.
Madc2vcf and Pedigree functions updates
Validate Ped
trios with low markers will still be flagged but now will show recommendations

When no parent pair passes the threshold of errors, they will still be shown in the final report

Find Parents

Fixed formatting of final output when ties on best,pair were found.
Implemented vectorization and improvements on efficiency.
When two recommendations are tied on error %, the tiebreaker is the number of markers tested. the option with the highest # of markers testes takes priority.
@alex-sandercock alex-sandercock requested a review from Copilot May 18, 2026 11:58
@alex-sandercock alex-sandercock self-assigned this May 18, 2026
@alex-sandercock alex-sandercock added the enhancement New feature or request label May 18, 2026
@alex-sandercock alex-sandercock merged commit 12798c2 into main May 18, 2026
10 checks passed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prepares BIGr 0.7.2 for CRAN with new MADC validation/conversion workflows, new pedigree/parentage utilities, expanded DArT report-to-VCF support, updated documentation, tests, and CI setup.

Changes:

  • Adds madc2vcf_multi, check_madc_sanity, find_parentage, and validate_pedigree.
  • Expands dosage2vcf, check_ped, get_countsMADC, and imputation_concordance.
  • Updates package metadata, generated Rd docs, tests, NEWS, NAMESPACE, and GitHub Actions.

Reviewed changes

Copilot reviewed 32 out of 42 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
.github/workflows/R-CMD-check.yaml Updates CI dependencies and CRAN-like behavior.
.gitignore Adds macOS artifact ignore.
BIGr.Rproj Adds RStudio project ID.
CRAN-SUBMISSION Updates CRAN submission metadata.
DESCRIPTION Bumps version and dependencies.
NAMESPACE Exports/imports new functions and dependencies.
NEWS.md Documents release changes.
R/check_madc_sanity.R Adds MADC sanity checks and botloci remapping.
R/check_ped.R Refactors pedigree checking output.
R/dosage2vcf.R Adds SNP/INDEL report parsing and alignment validation.
R/find_parentage.R Adds parentage assignment utility.
R/get_countsMADC.R Adds object input and match-count handling.
R/imputation_concordance.R Adds plotting/printing controls and doc updates.
R/madc2vcf_multi.R Adds polyRAD-based MADC-to-VCF conversion.
R/thinSNP.R Narrows roxygen imports.
R/utils.R Adds verbose/url helpers and global variables.
R/validate_pedigree.R Adds Mendelian trio validation utility.
cran-comments.md Updates CRAN notes.
dev/dev_history.R Comments dependency-amend command.
man/*.Rd Updates/generated documentation for changed APIs.
tests/testthat/* Adds and updates test coverage for new/changed behavior.
Files not reviewed (9)
  • man/check_madc_sanity.Rd: Language not supported
  • man/check_ped.Rd: Language not supported
  • man/dosage2vcf.Rd: Language not supported
  • man/find_parentage.Rd: Language not supported
  • man/get_countsMADC.Rd: Language not supported
  • man/imputation_concordance.Rd: Language not supported
  • man/madc2vcf_all.Rd: Language not supported
  • man/madc2vcf_multi.Rd: Language not supported
  • man/madc2vcf_targets.Rd: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


# Print mean concordance
# Summary statistics
summary_concordance <- summary(percentage_match, na.rm = TRUE) * 100
Comment thread R/validate_pedigree.R
if (comparisons == 0) return(NA_real_)
(base::sum(cand_hom != prog_hom, na.rm = TRUE) / comparisons) * 100
})

Comment thread R/check_madc_sanity.R
if(is.null(mi_df$Chr) | is.null(mi_df$Pos)) stop("When MADC CloneID don't follow the format Chr_Pos, Chr and Pos columns must be provided in the markers_info file.")
}

if(!any(botloci$V1 %in% report$CloneID)) { # First check if any botloci markers are found in MADC file. If not, check for padding mismatch.
Comment thread R/find_parentage.R
Progeny = progeny_ids,
Male_Parent = NA_character_,
Female_Parent = NA_character_,
Mendelian_Error_Pct = NA_character_,
Comment thread R/madc2vcf_multi.R
if (!(file.exists(madc_file) | url_exists(madc_file))) stop("MADC file not found. Please provide a valid path or URL.")
if (!(file.exists(botloci_file) | url_exists(botloci_file))) stop("Botloci file not found. Please provide a valid path or URL.")
if (!is.null(markers_info) && !(file.exists(markers_info) | url_exists(markers_info))) stop("markers_info file not found. Please provide a valid path or URL.")
if (!is.numeric(ploidy) || ploidy < 1) stop("ploidy must be a positive integer.")
Comment thread R/check_madc_sanity.R
if(!any(is.na(report$CloneID))) {
pos <- strsplit(report$CloneID, "_")
format <- all(sapply(pos, length) == 2)
first <- all(grepl("^[A-Za-z]", sapply(pos, "[", 1)))
Comment on lines +1 to +4
ID Male_Parent Female_Parent
IND_C IND_A IND_B
IND_D 0 IND_A
GHOST IND_A IND_B
potato_markers_info_ChromPos <- paste0(github_path, "test_madcs/potato_marker_info_chrompos.csv") # markers_info: CloneID/BI_markerID, Chr, Pos
potato_microhapDB <- paste0(github_path, "potato/potato_allele_db_v001.fa")

skip_if_offline("raw.githubusercontent.com")
Comment on lines +2 to +4
skip_if_offline("raw.githubusercontent.com")

github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/test_madcs/"
Comment thread R/check_ped.R
Comment on lines +181 to +187
input_ped_report <- list(
exact_duplicates = exact_duplicates,
repeated_ids_diff = repeated_ids_report,
messy_parents = messy_parents,
missing_parents = missing_parents,
dependencies = data.frame(Dependency = unique(unlist(errors))),
corrected_pedigree = data
@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

❌ Patch coverage is 79.18015% with 386 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.02%. Comparing base (a248e93) to head (f2c847b).
⚠️ Report is 93 commits behind head on main.

Files with missing lines Patch % Lines
R/madc2vcf_all.R 64.73% 122 Missing ⚠️
R/madc2vcf_targets.R 74.89% 61 Missing ⚠️
R/dosage2vcf.R 85.18% 56 Missing ⚠️
R/validate_pedigree.R 85.76% 37 Missing ⚠️
R/check_ped.R 58.33% 30 Missing ⚠️
R/madc2vcf_multi.R 71.73% 26 Missing ⚠️
R/imputation_concordance.R 46.15% 21 Missing ⚠️
R/check_madc_sanity.R 92.15% 12 Missing ⚠️
R/find_parentage.R 93.90% 12 Missing ⚠️
R/get_countsMADC.R 83.01% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #61      +/-   ##
==========================================
- Coverage   83.35%   82.02%   -1.34%     
==========================================
  Files          19       23       +4     
  Lines        1358     2876    +1518     
==========================================
+ Hits         1132     2359    +1227     
- Misses        226      517     +291     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants