Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
778aefa
indels support for madc2vcf_targets
Cristianetaniguti Oct 3, 2025
1b761b9
updated check_ped to save corrected dataframe and report
Nov 4, 2025
743043a
reorganized report and fixed language
Nov 4, 2025
0b97b46
bugfix - if hapDB padding is not matching report
Cristianetaniguti Nov 14, 2025
bccf0ac
add indel exception
Cristianetaniguti Jan 22, 2026
25caf0a
bugfix
Cristianetaniguti Jan 22, 2026
4f30e52
up version
Cristianetaniguti Jan 22, 2026
82279af
added option to print plot or list to imputation_concordance
Feb 26, 2026
6b81982
ignore DS_STore
Mar 3, 2026
8205e4e
added option to print pre-filtering depth and genotyping rate
Mar 3, 2026
31248e3
added calculation for Ho
Mar 4, 2026
757b01c
up version
Cristianetaniguti Mar 13, 2026
0934210
Merge branch 'check_ped_update' of https://github.com/Breeding-Insigh…
Cristianetaniguti Mar 13, 2026
e18b2c6
merge dev branches
Cristianetaniguti Mar 13, 2026
768ab93
Merge branch 'development' into ped_indels_update
Cristianetaniguti Mar 13, 2026
5c0b590
opt messages
Cristianetaniguti Mar 13, 2026
9afb265
messages ok
Cristianetaniguti Mar 14, 2026
c31118d
targets okay
Cristianetaniguti Mar 25, 2026
5d54f0d
targets ok
Cristianetaniguti Mar 27, 2026
ee50981
Potential fix for pull request finding
alex-sandercock Mar 27, 2026
d3a4061
Potential fix for pull request finding
alex-sandercock Mar 27, 2026
f765c7c
Potential fix for pull request finding
alex-sandercock Mar 27, 2026
87bb1fc
Potential fix for pull request finding
alex-sandercock Mar 27, 2026
7c12d49
Merge branch 'ped_indels_update' into madc2vcf_all_updates
Cristianetaniguti Mar 27, 2026
df6fe92
Merge pull request #53 from Breeding-Insight/madc2vcf_all_updates
Cristianetaniguti Mar 27, 2026
6059c10
Update R/madc2vcf_targets.R
Cristianetaniguti Mar 27, 2026
b09b0c1
Update R/check_madc_sanity.R
Cristianetaniguti Mar 27, 2026
409dbd3
Update R/get_countsMADC.R
Cristianetaniguti Mar 27, 2026
e6fce19
Update R/get_countsMADC.R
Cristianetaniguti Mar 27, 2026
669ac4e
Update R/check_madc_sanity.R
Cristianetaniguti Mar 27, 2026
bbfbee2
fix tests
Cristianetaniguti Mar 27, 2026
38c3564
Merge branch 'ped_indels_update' of https://github.com/Breeding-Insig…
Cristianetaniguti Mar 27, 2026
55ee61a
madc2vcf_all indels support okay
Cristianetaniguti Mar 27, 2026
bf5ff4c
madc2vcf_all support indel
Cristianetaniguti Mar 31, 2026
291ae8e
add support for Others
Cristianetaniguti Apr 1, 2026
84852da
up version
Cristianetaniguti Apr 1, 2026
96a4ed1
add madc2vcf_multi
Cristianetaniguti Apr 1, 2026
cec168d
fix checks
Cristianetaniguti Apr 1, 2026
0be2e0f
fix checks 2
Cristianetaniguti Apr 1, 2026
33fc87c
add VariantAnnotation to test env
Cristianetaniguti Apr 1, 2026
77107ba
ignore madc2vcf_multi tests in actions
Cristianetaniguti Apr 1, 2026
ccf9e77
more messages and tests
Cristianetaniguti Apr 2, 2026
8a00c9e
bugfix
Cristianetaniguti Apr 2, 2026
f2013e3
update man
Cristianetaniguti Apr 2, 2026
b01c12b
minor version up
Cristianetaniguti Apr 2, 2026
d0e02e2
added v1 of parentage function
Apr 2, 2026
5baae79
modified code to use data.table for increased efficiency
Apr 2, 2026
a252f22
improved assign_parentage and validate_parentage
Apr 3, 2026
bf0a468
finalized parentage functions for diploids and test files
Apr 6, 2026
01e943d
Added parentage functions and updated associated files
Apr 6, 2026
8ee65a3
updated headers and importFrom for functions along iwth namespace
Apr 6, 2026
e5b2004
deleted cra check files
Apr 6, 2026
4ec471b
Fix formatting of RefAltSeqs documentation
alex-sandercock Apr 8, 2026
cb76880
updated docs
alex-sandercock Apr 8, 2026
9824e29
Updated parentage functions to include package::function
Apr 9, 2026
5eb356a
updated functions, test files and man files for parentage (except for…
Apr 10, 2026
ab944e8
revert filterVCF
alex-sandercock Apr 17, 2026
4ac0c55
support LUT Marker_ID
alex-sandercock Apr 17, 2026
5234572
Apply suggestion from @Copilot
alex-sandercock Apr 17, 2026
8c9dcda
Apply suggestion from @Copilot
alex-sandercock Apr 20, 2026
b4d5534
covered error case
alex-sandercock Apr 17, 2026
57bbc89
add example and suggest ggplot
alex-sandercock Apr 17, 2026
4bb67fd
make get_counts internal
alex-sandercock Apr 17, 2026
ff1ef84
update test
alex-sandercock Apr 17, 2026
86b4fef
added marker_id support
alex-sandercock Apr 20, 2026
c10d134
fixed AD generation bug
alex-sandercock Apr 20, 2026
1ae386f
improved truth check
alex-sandercock Apr 20, 2026
49cb0a4
support Marker_ID
alex-sandercock Apr 20, 2026
22fc6e4
Update documentation for verbose message utility
alex-sandercock Apr 20, 2026
38dd609
fix exports
alex-sandercock Apr 20, 2026
56336da
skipping if offline
alex-sandercock Apr 20, 2026
089e8fd
madc2vcf_multi better function description
Cristianetaniguti Apr 21, 2026
8ee0b81
roxygenise
Cristianetaniguti Apr 21, 2026
555dda7
Updated parentage functions based on Meng's feedback:
Apr 22, 2026
c2cba9d
Merge pull request #54 from Breeding-Insight/ped_indels_update
alex-sandercock Apr 22, 2026
523049b
Updated parentage functions based on Meng's feedback:
Apr 22, 2026
ccc0952
Merge branch 'add_parentage_functions' into development
josuechinchilla Apr 22, 2026
7023a25
improve support for dosage2vcf
alex-sandercock May 17, 2026
7ae6e3d
edit comments
alex-sandercock May 17, 2026
099da55
Merge pull request #60 from Breeding-Insight/dosage2vcf_update
alex-sandercock May 17, 2026
1f5a3f3
updated news
alex-sandercock May 17, 2026
e63410e
reduce cran testing time
alex-sandercock May 17, 2026
54e1064
Added global items
alex-sandercock May 17, 2026
2480ef7
updated check_ped for CRAN
alex-sandercock May 17, 2026
44fe4f7
minor version update
alex-sandercock May 17, 2026
cd23487
update CRAN comments
alex-sandercock May 17, 2026
0fd7d9c
CRAN fix
alex-sandercock May 17, 2026
f2c847b
cran submit
alex-sandercock May 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ jobs:
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
R_KEEP_PKG_SOURCE: yes
NOT_CRAN: true

steps:
- uses: actions/checkout@v3
Expand All @@ -43,7 +44,12 @@ jobs:
extra-packages: |
any::rcmdcheck
any::covr
any::polyRAD
needs: check

- name: Install VariantAnnotation (no Suggests)
run: pak::pkg_install("bioc::VariantAnnotation", dependencies = c("Depends", "Imports", "LinkingTo"))
shell: Rscript {0}
- uses: r-lib/actions/check-r-package@v2

- name: Generate test coverage report
Expand All @@ -57,4 +63,3 @@ jobs:
token: ${{ secrets.CODECOV_TOKEN }}
slug: Breeding-Insight/BIGr
files: coverage.xml

1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@
.RData
.Ruserdata
revdep/
.DS_Store
1 change: 1 addition & 0 deletions BIGr.Rproj
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
Version: 1.0
ProjectId: 0eeaab63-2615-4da7-b10a-927160fc78a3

RestoreWorkspace: No
SaveWorkspace: No
Expand Down
6 changes: 3 additions & 3 deletions CRAN-SUBMISSION
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
Version: 0.6.2
Date: 2025-09-18 12:16:11 UTC
SHA: 142dc9524d88b47db88ddca2aa39cd729a8d5a0d
Version: 0.7.2
Date: 2026-05-17 23:05:53 UTC
SHA: 0fd7d9c081cfb341c56dc58de4d77d283d7ce726
11 changes: 7 additions & 4 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: BIGr
Title: Breeding Insight Genomics Functions for Polyploid and Diploid Species
Version: 0.6.2
Version: 0.7.2
Authors@R: c(person(given='Alexander M.',
family='Sandercock',
email='sandercock.alex@gmail.com',
Expand All @@ -23,7 +23,7 @@ Authors@R: c(person(given='Alexander M.',
person(given='Dongyan',
family='Zhao',
role='ctb'),
person('Cornell', 'University',
person('University', "of Florida",
role=c('cph'),
comment = "Breeding Insight"))
Maintainer: Alexander M. Sandercock <sandercock.alex@gmail.com>
Expand All @@ -44,7 +44,7 @@ URL: https://github.com/Breeding-Insight/BIGr
BugReports: https://github.com/Breeding-Insight/BIGr/issues
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
RoxygenNote: 7.3.3
Depends: R (>= 4.4.0)
biocViews:
Imports:
Expand All @@ -62,12 +62,15 @@ Imports:
janitor,
quadprog,
tibble,
stringr
stringr,
data.table
Suggests:
covr,
ggplot2,
spelling,
rmdformats,
knitr (>= 1.10),
rmarkdown,
polyRAD,
testthat (>= 3.0.0)
RdMacros: Rdpack
28 changes: 27 additions & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -4,27 +4,30 @@ export(allele_freq_poly)
export(calculate_Het)
export(calculate_MAF)
export(check_homozygous_trios)
export(check_madc_sanity)
export(check_ped)
export(check_replicates)
export(dosage2vcf)
export(dosage_ratios)
export(filterMADC)
export(filterVCF)
export(find_parentage)
export(flip_dosage)
export(get_countsMADC)
export(imputation_concordance)
export(madc2gmat)
export(madc2vcf_all)
export(madc2vcf_multi)
export(madc2vcf_targets)
export(merge_MADCs)
export(solve_composition_poly)
export(thinSNP)
export(updog2vcf)
export(validate_pedigree)
import(dplyr)
import(janitor)
import(parallel)
import(quadprog)
import(rlang)
import(stringr)
import(tibble)
import(tidyr)
Expand All @@ -33,13 +36,36 @@ importFrom(Biostrings,DNAString)
importFrom(Biostrings,reverseComplement)
importFrom(Rdpack,reprompt)
importFrom(Rsamtools,bgzip)
importFrom(data.table,CJ)
importFrom(data.table,as.data.table)
importFrom(data.table,copy)
importFrom(data.table,data.table)
importFrom(data.table,fread)
importFrom(data.table,fwrite)
importFrom(data.table,rbindlist)
importFrom(data.table,set)
importFrom(dplyr,"%>%")
importFrom(dplyr,across)
importFrom(dplyr,arrange)
importFrom(dplyr,case_when)
importFrom(dplyr,filter)
importFrom(dplyr,group_by)
importFrom(dplyr,group_modify)
importFrom(dplyr,mutate)
importFrom(dplyr,select)
importFrom(dplyr,summarise)
importFrom(dplyr,ungroup)
importFrom(dplyr,where)
importFrom(pwalign,nucleotideSubstitutionMatrix)
importFrom(pwalign,pairwiseAlignment)
importFrom(readr,read_csv)
importFrom(reshape2,dcast)
importFrom(reshape2,melt)
importFrom(rlang,sym)
importFrom(stats,cor)
importFrom(stats,reorder)
importFrom(stats,setNames)
importFrom(tibble,as_tibble)
importFrom(utils,packageVersion)
importFrom(utils,read.csv)
importFrom(utils,read.table)
Expand Down
105 changes: 104 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,107 @@
# BIGr 0.7.2

- Fixed manual text errors

# BIGr 0.7.1

- Updated `check_ped()` to return corrected pedigree data in the result list instead of assigning objects to the global environment
- Skipped long remote `madc2vcf_all` integration tests on CRAN while keeping them enabled in GitHub Actions

# BIGr 0.7.0

## Updates on `dosage2vcf`

- Added support for DArT SNP/INDEL 1-row and 2-row report formats
- `dosage2vcf` now validates marker and sample sets between report and counts files, then aligns counts to the report order before writing VCF genotypes
- VCF `CHROM` and `POS` are derived from `Chrom`/`ChromPos` when present, otherwise from `MarkerName`; `MarkerName` is retained in the VCF `ID` field
- Missing SNP/INDEL genotype calls (`-`/`NA`) are written as diploid missing genotypes (`./.`)

## New function `madc2vcf_multi`

- New function `madc2vcf_multi` to convert a DArTag MADC file to a VCF using the polyRAD pipeline for multiallelic genotyping
- Runs `check_madc_sanity` before loading the data and stops with informative errors if:
- Required columns are missing
- IUPAC (non-ATCG) codes are present in AlleleSequence
- Ref/Alt sequences are unpaired (`RefAltSeqs = FALSE`)
- Allele IDs have not been fixed by HapApp (`FixAlleleIDs = FALSE`)
- CloneIDs do not follow `Chr_Pos` format and no `markers_info` is provided
- New argument `markers_info`: optional path or URL to a CSV with `CloneID`/`BI_markerID`, `Chr`, and `Pos` columns; required when CloneIDs do not follow the `Chr_Pos` format
- Runs `check_botloci` to validate and reconcile CloneIDs between the MADC and botloci file, automatically fixing padding mismatches
- A corrected temp file is written and passed to `readDArTag` only when needed (all-NA rows/columns detected, CloneIDs remapped by `check_botloci`, or botloci IDs remapped)
- Accepts paths or URLs for `madc_file`, `botloci_file`, and `markers_info`
- Estimates overdispersion with `polyRAD::TestOverdispersion`, iterates priors with `polyRAD::IterateHWE`, and exports the result with `polyRAD::RADdata2VCF`
- `polyRAD` is a soft dependency (listed under `Suggests`); an informative error is raised if it is not installed

# BIGr 0.6.6

## Updates on `madc2vcf_all`

- New arguments for controlling processing of `Other` alleles:
- `add_others`: if `TRUE` (default), alleles labeled "Other" in the MADC are included in off-target SNP extraction
- `others_max_snps`: discards Other alleles with more than this many SNP differences relative to the Ref sequence (default: 5)
- `others_rm_with_indels`: discards Other alleles containing insertions or deletions relative to the Ref sequence (default: `TRUE`)
- Others alleles that carry a different base at the target SNP position are now reported as a 3rd allele in the VCF instead of being silently dropped
- Target position is now correctly removed from Others alignments, preventing duplicate VCF positions and marker IDs
- Fixed a bug where Others alleles with "Ref_" or "Alt_" in their AlleleID would corrupt the target SNP REF/ALT fields and read depth counts in `merge_counts`
- Improved verbose messages throughout: counts of Other alleles found, kept, and discarded (by indel filter and by max SNP filter) are now reported; multiallelic target SNPs with a 3rd allele from Others are counted and reported
- Debug-level message (level 3) listing each Other allele added and its genomic position

# BIGr 0.6.5

## Updates on madc2vcf functions
Details:

- both functions targets and all (targets + off-targets) markers now have `check_madc_sanity` function implemented. It tests:
- [Columns] If MADC has the expected columns
- [allNArow | allNAcol] Presence of columns and rows with all NA (happens often when people open the MADC in excel before loading in R)
- [IUPACcodes] Presence of IUPAC codes on AlleleSequence
- [LowerCase] Presence of lower case bases on AlleleSequence
- [Indels] Presence of Indels
- [ChromPos] If CloneID follows the format Chr_Pos
- [RefAltSeqs] If all Ref Allele has corresponding Alt and vice-versa
- [OtherAlleles] If "Other" exists in the MADC AlleleID

- Better messages if `verbose = TRUE` in `madc2vcf_all`
- `madc2vcf_all` support for Indels - markers_info with Indels position is required; only the target indel is extracted, off-targets are ignored for the tag
- `madc2vcf_targets` doesn’t run if:
- MADC Column names are not correct
- Ignore Other alleles - but inform the user if they exist or not and direct them to `madc2vcf_all` in case they want to extract them as well
- See the table for madc2vcf_targets requirements accordingly to MADC content:

  | check status | get_REF_ALT | Requires
-- | -- | -- | --
IUPAC | TRUE | TRUE | markers_info REF/ALT
  | TRUE | FALSE | -
  | FALSE | TRUE | botloci or markers_info REF/ALT
  | FALSE | FALSE | -
Indels | TRUE | TRUE | markers_info REF/ALT
  | TRUE | FALSE | -
  | FALSE | TRUE | botloci or markers_info REF/ALT
  | FALSE | FALSE | -
ChromPos | TRUE | TRUE | botloci or markers_info REF/ALT
  | TRUE | FALSE | -
  | FALSE | TRUE | markers_info CHR/POS/REF/ALT or markers_info CHR/POS/ + botloci
  | FALSE | FALSE | markers_info CHR/POS
FixAlleleIDs | TRUE | TRUE | botloci or markers_info REF/ALT
  | TRUE | FALSE | -
  | FALSE | TRUE | markers_info REF/ALT
  | FALSE | FALSE | -

# BIGr 0.6.4

- Add function `vmsg` to organize messages printed on the console
- Add metadata to VCF header from madc2vcf_targets
- Add argument `madc_object` to `get_countsMADC` to avoid reading the MADC file twice and to get directly the MADC fixed padding output from `check_botloci`
- Organize messages from `madc2vcf_targets` checks
- Add argument `collapse_matches_counts` and `verbose` to `madc2vcf_targets` function

# BIGr 0.6.3

- New function to check MADC files: `check_madc_sanity`. Currently, it checks for the presence of required columns, whether fixed allele IDs were assigned, the presence of IUPAC codes, lowercase sequence bases, indels, and chromosome and position information.
- Added new argument `markers_info`, which allows users to provide a CSV file with marker information such as CHROM, POS, marker type, and position of indels. For BI species, this information is available from [PanelHub](https://github.com/Breeding-Insight/BIGapp-PanelHub).
- Checked inputs for `madc2vcf_all`.
- Updated affiliation in `DESCRIPTION`.

# BIGr 0.6.2

- Fixed the doi and name list in the CITATION file
Expand Down Expand Up @@ -51,4 +155,3 @@
- updog2vcf function option to output compressed VCF (.vcf.gz) - set as default
- remove need for defining ploidy
- add metadata at the VCF header

Loading
Loading