Releases: JDenn0514/surveycore
surveycore 0.8.0
Breaking changes
- Constructing a
survey_collectionfrom member surveys with divergent@groupsnow errorssurveycore_error_collection_group_divergent. Previously, a mixed-grouping collection would dispatch analysis functions per-survey and stitch a patchwork of grouped and ungrouped rows together withbind_rows()— violating the pseudo-data.frame mental model. All members must either share@groupsor the caller must supplygroup =explicitly. as_survey_collection()'s.on_missingargument has been replaced by.if_missing_var, and the previously silent no-op behaviour is fixed..if_missing_varis now stored on the returned collection's@if_missing_varproperty and is honoured (rather than ignored) by every dispatchedget_*(). Callers using the old name will see R's positional-argument-mismatch error.- The
.on_missingnamed-only argument on every collection-dispatchingget_*()(get_means(),get_totals(),get_freqs(),get_ratios(),get_diffs(),get_corr(),get_variance(),get_quantiles(),get_covariance(),get_t_test(),get_pairwise()) has been renamed to.if_missing_var. The default flips from"error"toNULL;NULLresolves to the collection's stored@if_missing_varproperty, while a non-NULLvalue overrides it for that call. The.idargument similarly defaults toNULLand resolves to the collection's stored@id. Callers passing.on_missing = ...will silently have the value flow into...(no behaviour change at the analysis layer); update to.if_missing_var = ...to restore intent.
New features
survey_collection per-call dispatch defaults
survey_collectiongains two new properties:@id(character(1), default".survey") — column name.dispatch_over_collection()uses when an analysis function is dispatched across the collection without an explicit per-call.id. Validated via the new shared helper; the existingsurveycore_error_collection_invalid_idclass fires on bad input.@if_missing_var(character(1), default"error", must be one ofc("error", "skip")) — controls how dispatchedget_*()calls behave when a member survey is missing a requested variable. Validated via the new helper; raises the newsurveycore_error_collection_invalid_if_missing_varerror class on bad input.
- New exported setters
set_collection_id(x, id)andset_collection_if_missing_var(x, if_missing_var)mutate the corresponding property and return the collection invisibly. Both validate via the same shared helpers; both raisesurveycore_error_not_survey_collectionon non-collection input. add_survey()andremove_survey()now propagate the source collection's@idand@if_missing_varonto the returned collection.print(survey_collection)rendersid:andif_missing_var:lines on every print, regardless of whether they hold the default values..dispatch_over_collection()resolves both.idand.if_missing_varvia two-tier precedence: a non-NULLvalue at the analysis-function call site beats the value stored on the collection's property. Thesurveycore_error_collection_id_collisionhint additionally surfacesset_collection_id()as a fix path when the collision was triggered by the stored@id.
Uniform grouping on survey_collection
survey_collectiongains a@groupsproperty (character(0)by default). Every member survey's@groupsis assertedidentical()to the collection's value by the class validator — a uniform-grouping invariant that guarantees dispatchedget_*()results share a single grouping structure.as_survey_collection()gains agroup =argument that accepts tidy-select column names (bare,c(),all_of()). Missing or empty-resolvedgroup =(includingNULL,character(0),c(),all_of(character(0))) adopts the members' uniform@groupsor errors on divergence; a supplied non-emptygroup =overrides any pre-existing member@groupsand emits a typedsurveycore_warning_collection_group_overriddenper divergent member.add_survey()andremove_survey()now preservecoll@groupsacross mutation: a grouped collection propagates its@groupsonto any empty-grouped new member and errors on divergent-grouped members (surveycore_error_collection_group_conflict); removal keeps the collection-level grouping.
Polychoric and polyserial correlation via get_corr(method = ...)
get_corr()gains amethod = "pearson"argument. Settingmethod = "polychoric"fits a weighted two-step MLE for the correlation between two ordinal variables under a bivariate-normal latent model (Olsson 1979; Mannan 2025);method = "polyserial"fits the analogous MLE for one ordinal + one continuous variable (Cox 1974). Auto-detection of the ordinal / continuous side is handled internally; no new user-facing argument is required. Confidence intervals are constructed on the Fisher-z scale and back-transformed to[-1, 1]. Variance is design-based: Taylor linearization via a perturbation-based influence function onsurvey_taylor, and a full per-replicate re-fit of both thresholds andrhoonsurvey_replicate. Formethod != "pearson",df = NA_integer_andstatisticis the z-scale Wald statistic referred to a standard normal distribution.meta(result)$bivariate_normal_cdfis"pbivnorm", andmeta(result)$n_failed_replicates_totalcarries the total count of non-converged replicates when the replicate path observed any. Agreement withpolycor::polychor()/polycor::polyserial()on equal-weight fixtures is within1e-4.- New package Import:
pbivnorm(>= 0.6.0), used as the bivariate-normal CDF for the polychoric / polyserial likelihood. - Fourteen new typed error / warning classes (PC-1 through PC-14) surface ordinal-type, optimizer, sparse-cell, boundary, and replicate-convergence conditions — see
plans/error-messages.mdfor the full list.
New functions
get_variance()computes design-based finite-population variance estimates for one or more numeric variables in a survey design, matchingsurvey::svyvar()at tolerance1e-10on point estimates and1e-8on SEs. Returns asurvey_variancetibble with point estimate, SE, CI, CV, MOE, design effect (deff), and cell sizes. Supports grouping (viagroup =andgroup_by()), per-variablena_handling = "pairwise"(default) or"listwise",name_style = "broom"renaming, and column-levellabelattributes for downstream gt integration. Dispatches oversurvey_taylor,survey_replicate,survey_twophase,survey_nonprob, andsurvey_collectiondesigns.get_covariance()computes design-based finite-population covariance estimates for all unordered pairs drawn from one or more numeric variables in a survey design, matching the off-diagonal entries ofsurvey::svyvar()at tolerance1e-10on point estimates and1e-8on SEs. Returns asurvey_covariancetibble with covariance, SE, CI, CV, MOE, design effect (deff), and pairwise cell sizes. Pearson-only, pairwise-complete NA handling. Supports grouping (viagroup =andgroup_by()),redundant = TRUEto include both(x, y)and(y, x)orderings,diagonal = TRUEto include(x, x)self-pairs (which equalget_variance(x)exactly at1e-10),name_style = "broom"renaming, and column-levellabelattributes for downstream gt integration. Dispatches oversurvey_taylor,survey_replicate,survey_twophase,survey_nonprob, andsurvey_collectiondesigns.
New warning classes
surveycore_warning_variance_all_na— fired when every row of the active domain isNAon the focal variable.surveycore_warning_variance_insufficient_n— fired when the focal variable has fewer than two non-NAobservations in the active domain (variance is undefined).surveycore_warning_covariance_all_na— fired when every row of the active domain isNAon at least one variable in the pair.surveycore_warning_covariance_insufficient_n— fired when a pair has fewer than two pairwise-complete observations in the active domain (covariance is undefined).surveycore_warning_covariance_non_numeric— fired when one or more variables passed viaxare non-numeric and silently dropped from the pair list.
v0.7.0
Breaking changes
get_anova()'s first argument is nowobjectand dispatches on class. The formermodel2positional argument has been removed —get_anova(fit1, fit2)must now be writtenget_anova(list(fit1, fit2)). The S3anova(fit1, fit2)interface is unchanged.
New functions
Design-based group comparisons
get_t_test()performs a design-based two-sample t-test comparing group means for a numeric outcome across two levels of abyvariable. Returns asurvey_t_testtibble with estimate, per-group means and cell sizes, CI, t-statistic, df, p-value, and significance stars. Supports optional stratification viagroup(one row per stratum) and matchessurvey::svyttest()at tolerance 1e-10 for point estimates and test statistics.get_pairwise()computes all k(k−1)/2 pairwise t-tests across the levels of a factor, with multiple-comparison p-value adjustment via anystats::p.adjust()method ("holm"by default, or"none"). Adjustment is applied separately within eachgroupstratum when stratified. Returns asurvey_pairwisetibble with one row per pair.
Design-based ANOVA
get_anova()computes Rao-Scott design-based ANOVA forsurvey_glm_fitobjects, supporting both Wald and LRT tests with F or Chi-squared reference distributions. Three dispatch branches:get_anova(<survey_glm_fit>)— sequential term-by-term anova (matchesanova.svyglm()semantics).get_anova(<list<survey_glm_fit>>)— chained pairwise comparison acrossknested fits, returningk − 1rows.get_anova(<survey_base>, formula = ...)— fits the model internally viasurvey_glm()and runs sequential anova on the fit; extra...are forwarded tosurvey_glm().
Matchessurvey::regTermTest()at tolerance 1e-8 on statistics and 1e-6 on p-values.
anova(fit)on asurvey_glm_fitnow dispatches toget_anova()via a registered S3 method.plot()on asurvey_glm_fitproduces a dot-and-whisker coefficient plot with design-based Wald confidence intervals.
Select-all-that-apply (SATA) metadata
set_sata()marks one or more variables on a survey design (or data frame) as select-all-that-apply. Accepts either tidy-select...or avariablecharacter vector; settingsata = FALSEremoves the flag. Idempotent on already-flagged variables.extract_sata()returns SATA status as a named logical vector (default), a list, or a data frame.fill = FALSEyields a dense view (unmarked variables reported asFALSE);fill = NULLreturns only flagged variables.classify_question_type()classifies a set of requested variables into"single","sata", or"battery"by grouping them on sharedquestion_prefacemetadata and honoring per-variable SATA flags. Group numbers are assigned in order of first appearance. Warns when a lone SATA-flagged variable has no preface mate, or when a preface group has mixed SATA flags.
Survey collections
survey_collectionis a new S7 container holding an ordered, uniquely-named list ofsurvey_baseobjects — useful for wave-to-wave analyses, panel studies, or any workflow that compares estimates across multiple designs.as_survey_collection()constructs a collection from named (wave1 = d1, wave2 = d2) or bare (d1, d2) arguments; duplicate names are repaired by appending_1,_2, … with a warning showing the rename mapping.add_survey()andremove_survey()return new collections with surveys appended or removed; the original is unchanged.- All nine
get_*()analysis functions (get_means(),get_totals(),get_freqs(),get_quantiles(),get_ratios(),get_corr(),get_diffs(),get_t_test(),get_pairwise()) now dispatch over asurvey_collection, iterating across surveys and returning a single combined tibble. Two new named-only control args on each function:.id = ".survey"names the identifier column, and.on_missing = c("error", "skip")controls behavior when a requested variable is absent from a survey. Regression functions (survey_glm(),get_anova()) do not support collection dispatch and raise an explicit error pointing users tolapply().
Other improvements
survey_glm()gains aquiet =argument to suppress convergence warnings.extract_*()metadata functions now accept tidyselect helpers (starts_with(),all_of(),any_of(),matches()) in place of bare name lists.
Bug fixes
get_diffs()now correctly computespct_changewhenshow_means = FALSEis combined with grouped marginal effects andshow_pct_change = TRUE(previously returnedNA).
surveycore 0.6.0
Breaking changes
survey_srsclass andas_survey_srs()constructor have been removed. SRS
designs are now created viaas_survey()with noidsorstrata— this
produces asurvey_taylorwith no cluster/strata structure. All estimates are
numerically identical.
New features
-
get_diffs()estimates treatment effects (differences from a reference group)
via survey-weighted regression. Supports bivariate and multivariate models,
Gaussian and non-Gaussian families, and optional subgroup analysis. Two
estimation paths: direct coefficients for simple models, and
marginaleffects::avg_slopes()/avg_predictions()for models with
covariates or non-Gaussian AMEs. Returns asurvey_diffstibble with optional
mean,pct_change,n_weightedcolumns, significance stars, and p-value
adjustment.marginaleffectsmoved from Suggests to Imports. -
as_survey()now supports multi-column FPC for multi-stage designs
(e.g.,fpc = c(fpc_stage1, fpc_stage2)). Each FPC column corresponds to one
ID stage. Per-stage FPC is validated for NAs, non-positive values, and
within-cluster constancy. -
print()forsurvey_taylornow displays per-stage FPC bullets for
multi-stage designs (e.g.,FPC (stage 1): fpc,FPC (stage 2): fpc2).
Bug fixes
-
SRS variance estimation now uses Taylor (HT) linearization via
.build_cluster_matrices(), correct for any weight structure. Previously used
unweighted sample variance which was incorrect for non-proportional weights. -
survey_glm()now correctly indexes weights whenna.action = na.omitdrops
non-contiguous rows. -
get_freqs()now routessurvey_nonprobdesigns through the
Horvitz-Thompson variance path, consistent with the other five analysis
functions. -
as_survey_twophase()now acceptssurvey_replicateand SRS
survey_taylorobjects as the phase-1 design (previously restricted to
stratified/clusteredsurvey_tayloronly). -
as_survey()SRS fallback downgraded from warning to message.
Internal infrastructure
.build_cluster_matrices()extracts multi-stage cluster, strata, and FPC
matrix construction into a shared helper, used across the Taylor variance
engine, analysis cell estimators, and GLM sandwich variance.
surveycore v0.5.0
Breaking changes
-
as_survey_replicate()replacesas_survey_repweights(). The constructor
name now matches the underlyingsurvey_replicateclass. -
survey_nonprobandas_survey_nonprob()replacesurvey_calibratedand
as_survey_calibrated(). "Calibrated" implies a post-processing step on a
probability sample;nonprobaccurately reflects the design type. -
The positional setter form
set_var_label(svy, age, "label")has been
removed. Use the named formset_var_label(svy, age = "label")instead. -
extract_var_label(),extract_question_preface(), andextract_var_note()
now return a named character vector.extract_var_label(svy, age)now
returnsc(age = "Age in years")rather than"Age in years". -
extract_val_labels()now returns a named list.extract_val_labels(svy, sex)
now returnslist(sex = c(Male = 1L, Female = 2L))rather than
c(Male = 1L, Female = 2L). -
set_variable_labels(),set_value_labels(),set_question_prefaces(), and
set_variable_notes()have been removed. Useset_var_label(),
set_val_labels(),set_question_preface(), andset_var_note()
respectively — all four now accept multiple variables via named....
New features
-
set_universe()andextract_universe()set and retrieve universe
(eligibility) annotations for survey variables. -
set_missing_codes()andextract_missing_codes()set and retrieve missing
value code vectors for survey variables. -
extract_metadata()returns all metadata fields (variable_label,
value_labels,question_preface,note,universe,missing_codes,
transformations) for one or more variables as a named list.
Enhancements
-
All setter functions now support three call conventions: named
...
(e.g.,set_var_label(svy, age = "Age in years")), a single named
vector/list in..., or explicitvariable =/ content-argument pairs.
All setters also now work on plaindata.frames. -
All extractor functions accept multiple variables via
..., support three
output formats ("named_vector","list","data_frame"), and accept a
fillargument to include variables with no metadata in the output.
surveycore v0.4.0
New features
-
survey_glm()fits survey-weighted generalized linear models for all five
design classes (survey_taylor,survey_replicate,survey_srs,
survey_twophase,survey_calibrated); returns asurvey_glm_fitobject
with design-based (Binder 1983 sandwich) standard errors and degrees of
freedom. -
clean()converts asurvey_glm_fitto a tidysurvey_glm_tidytibble
with one row per coefficient, design-based confidence intervals, structured
metadata, and optional reference rows for factor predictors. -
survey_glm_fitobjects support 20 S3 methods:print(),summary(),
coef(),vcov(),predict(),fitted(),residuals(),confint(),
formula(),terms(),model.matrix(),model.frame(),deviance(),
df.residual(),nobs(),hatvalues(),logLik(),AIC(),BIC(), and
update(). -
survey_glm_fitintegrates with themarginaleffectspackage; when
marginaleffectsis installed,avg_slopes(),avg_predictions(), and the
full marginaleffects API work directly onsurvey_glm_fitobjects. -
broom::tidy()is supported forsurvey_glm_fitobjects via a shim that
delegates toclean(). -
as_survey_rep()has been renamed toas_survey_repweights()to avoid a
namespace clash with thesrvyrpackage.
Bug fixes
as_survey_twophase()variance estimation (method = "approx"and
"full") now uses the correct PSU-level Phase 2 stratum sampling fraction
instead of a row-level fraction, resolving an approximately 2× variance
underestimation.
v0.3.3
New features
print()methods for all five survey design classes (survey_taylor,survey_srs,survey_replicate,survey_twophase,survey_calibrated) now display aDomain: <n> of <N> rowsline whensurveytidy::filter()has been applied. The line appears after the sample size line and before theGroups:line. For two-phase designs, domain counts reflect Phase 2 rows only.
v0.3.1
surveycore v0.3.0
New features
names()now works on survey design objects, returning the column names of
the underlying data frame. This enables IDE column-name autocomplete in
RStudio and Positron when piping into analysis functions (e.g.,
design |> get_means().
v0.2.0
New features
-
get_freqs()computes weighted frequency tables for categorical survey
variables across all five design types, with domain estimation, value-label
support, and AAPOR small-cell warnings. -
get_means()returns survey-weighted means with design-correct standard
errors for all five design types, including grouped and domain estimation. -
get_totals()returns survey-weighted population totals (and population
size when called withoutx) for all five design types. -
get_corr()computes survey-weighted Pearson correlation using the
delta-method variance approach, with optionalgroupparameter for
per-group correlations and Fisher Z confidence intervals. -
get_quantiles()estimates survey-weighted quantiles using the Woodruff
(1952) linearization method; supports multipleprobsin a single call and
five CI interval methods. -
get_ratios()estimates survey-weighted ratios (numerator total /
denominator total) with design-correct SEs via the delta method (Taylor,
SRS, calibrated, two-phase) or direct per-replicate computation (replicate
designs). -
All six analysis functions gain a
decimalsargument to round numeric
output columns to a fixed number of decimal places. -
na.rm = FALSEnow includes rows where a grouping variable isNAas a
separate group row in all six analysis functions' output. -
infer_question_prefaces()auto-detects shared battery prefaces from
variable labels using separator-based and longest-common-prefix detection. -
survey_weighting_history()returns the weighting history stored in a
survey design object's metadata;as_survey(),as_survey_rep(), and
as_survey_srs()now promote"weighting_history"attributes from the
input data frame automatically. -
Two-phase variance estimation (
as_survey_twophase()) is now fully
supported inget_means()andget_totals(), using the"full",
"approx", and"simple"methods vendored from thesurveypackage.
Bug fixes
-
get_freqs()no longer crashes when thegroupvariable containsNA
values. -
get_freqs()now outputspctas a proportion (0–1) rather than a
percentage (0–100);seandse_srsare on the same scale.