Releases · JDenn0514/surveycore

27 Apr 15:29

JDenn0514

v0.8.0

d335979

surveycore 0.8.0 Latest

Latest

Breaking changes

Constructing a survey_collection from member surveys with divergent @groups now errors surveycore_error_collection_group_divergent. Previously, a mixed-grouping collection would dispatch analysis functions per-survey and stitch a patchwork of grouped and ungrouped rows together with bind_rows() — violating the pseudo-data.frame mental model. All members must either share @groups or the caller must supply group = explicitly.
as_survey_collection()'s .on_missing argument has been replaced by .if_missing_var, and the previously silent no-op behaviour is fixed. .if_missing_var is now stored on the returned collection's @if_missing_var property and is honoured (rather than ignored) by every dispatched get_*(). Callers using the old name will see R's positional-argument-mismatch error.
The .on_missing named-only argument on every collection-dispatching get_*() (get_means(), get_totals(), get_freqs(), get_ratios(), get_diffs(), get_corr(), get_variance(), get_quantiles(), get_covariance(), get_t_test(), get_pairwise()) has been renamed to .if_missing_var. The default flips from "error" to NULL; NULL resolves to the collection's stored @if_missing_var property, while a non-NULL value overrides it for that call. The .id argument similarly defaults to NULL and resolves to the collection's stored @id. Callers passing .on_missing = ... will silently have the value flow into ... (no behaviour change at the analysis layer); update to .if_missing_var = ... to restore intent.

New features

`survey_collection` per-call dispatch defaults

survey_collection gains two new properties:
- @id (character(1), default ".survey") — column name .dispatch_over_collection() uses when an analysis function is dispatched across the collection without an explicit per-call .id. Validated via the new shared helper; the existing surveycore_error_collection_invalid_id class fires on bad input.
- @if_missing_var (character(1), default "error", must be one of c("error", "skip")) — controls how dispatched get_*() calls behave when a member survey is missing a requested variable. Validated via the new helper; raises the new surveycore_error_collection_invalid_if_missing_var error class on bad input.
New exported setters set_collection_id(x, id) and set_collection_if_missing_var(x, if_missing_var) mutate the corresponding property and return the collection invisibly. Both validate via the same shared helpers; both raise surveycore_error_not_survey_collection on non-collection input.
add_survey() and remove_survey() now propagate the source collection's @id and @if_missing_var onto the returned collection.
print(survey_collection) renders id: and if_missing_var: lines on every print, regardless of whether they hold the default values.
.dispatch_over_collection() resolves both .id and .if_missing_var via two-tier precedence: a non-NULL value at the analysis-function call site beats the value stored on the collection's property. The surveycore_error_collection_id_collision hint additionally surfaces set_collection_id() as a fix path when the collision was triggered by the stored @id.

Uniform grouping on `survey_collection`

survey_collection gains a @groups property (character(0) by default). Every member survey's @groups is asserted identical() to the collection's value by the class validator — a uniform-grouping invariant that guarantees dispatched get_*() results share a single grouping structure.
as_survey_collection() gains a group = argument that accepts tidy-select column names (bare, c(), all_of()). Missing or empty-resolved group = (including NULL, character(0), c(), all_of(character(0))) adopts the members' uniform @groups or errors on divergence; a supplied non-empty group = overrides any pre-existing member @groups and emits a typed surveycore_warning_collection_group_overridden per divergent member.
add_survey() and remove_survey() now preserve coll@groups across mutation: a grouped collection propagates its @groups onto any empty-grouped new member and errors on divergent-grouped members (surveycore_error_collection_group_conflict); removal keeps the collection-level grouping.

Polychoric and polyserial correlation via `get_corr(method = ...)`

get_corr() gains a method = "pearson" argument. Setting method = "polychoric" fits a weighted two-step MLE for the correlation between two ordinal variables under a bivariate-normal latent model (Olsson 1979; Mannan 2025); method = "polyserial" fits the analogous MLE for one ordinal + one continuous variable (Cox 1974). Auto-detection of the ordinal / continuous side is handled internally; no new user-facing argument is required. Confidence intervals are constructed on the Fisher-z scale and back-transformed to [-1, 1]. Variance is design-based: Taylor linearization via a perturbation-based influence function on survey_taylor, and a full per-replicate re-fit of both thresholds and rho on survey_replicate. For method != "pearson", df = NA_integer_ and statistic is the z-scale Wald statistic referred to a standard normal distribution. meta(result)$bivariate_normal_cdf is "pbivnorm", and meta(result)$n_failed_replicates_total carries the total count of non-converged replicates when the replicate path observed any. Agreement with polycor::polychor() / polycor::polyserial() on equal-weight fixtures is within 1e-4.
New package Import: pbivnorm (>= 0.6.0), used as the bivariate-normal CDF for the polychoric / polyserial likelihood.
Fourteen new typed error / warning classes (PC-1 through PC-14) surface ordinal-type, optimizer, sparse-cell, boundary, and replicate-convergence conditions — see plans/error-messages.md for the full list.

New functions

get_variance() computes design-based finite-population variance estimates for one or more numeric variables in a survey design, matching survey::svyvar() at tolerance 1e-10 on point estimates and 1e-8 on SEs. Returns a survey_variance tibble with point estimate, SE, CI, CV, MOE, design effect (deff), and cell sizes. Supports grouping (via group = and group_by()), per-variable na_handling = "pairwise" (default) or "listwise", name_style = "broom" renaming, and column-level label attributes for downstream gt integration. Dispatches over survey_taylor, survey_replicate, survey_twophase, survey_nonprob, and survey_collection designs.
get_covariance() computes design-based finite-population covariance estimates for all unordered pairs drawn from one or more numeric variables in a survey design, matching the off-diagonal entries of survey::svyvar() at tolerance 1e-10 on point estimates and 1e-8 on SEs. Returns a survey_covariance tibble with covariance, SE, CI, CV, MOE, design effect (deff), and pairwise cell sizes. Pearson-only, pairwise-complete NA handling. Supports grouping (via group = and group_by()), redundant = TRUE to include both (x, y) and (y, x) orderings, diagonal = TRUE to include (x, x) self-pairs (which equal get_variance(x) exactly at 1e-10), name_style = "broom" renaming, and column-level label attributes for downstream gt integration. Dispatches over survey_taylor, survey_replicate, survey_twophase, survey_nonprob, and survey_collection designs.

New warning classes

surveycore_warning_variance_all_na — fired when every row of the active domain is NA on the focal variable.
surveycore_warning_variance_insufficient_n — fired when the focal variable has fewer than two non-NA observations in the active domain (variance is undefined).
surveycore_warning_covariance_all_na — fired when every row of the active domain is NA on at least one variable in the pair.
surveycore_warning_covariance_insufficient_n — fired when a pair has fewer than two pairwise-complete observations in the active domain (covariance is undefined).
surveycore_warning_covariance_non_numeric — fired when one or more variables passed via x are non-numeric and silently dropped from the pair list.

Assets 2

21 Apr 21:19

JDenn0514

v0.7.0

a2806e0

v0.7.0

Breaking changes

get_anova()'s first argument is now object and dispatches on class. The former model2 positional argument has been removed — get_anova(fit1, fit2) must now be written get_anova(list(fit1, fit2)). The S3 anova(fit1, fit2) interface is unchanged.

New functions

Design-based group comparisons

get_t_test() performs a design-based two-sample t-test comparing group means for a numeric outcome across two levels of a by variable. Returns a survey_t_test tibble with estimate, per-group means and cell sizes, CI, t-statistic, df, p-value, and significance stars. Supports optional stratification via group (one row per stratum) and matches survey::svyttest() at tolerance 1e-10 for point estimates and test statistics.
get_pairwise() computes all k(k−1)/2 pairwise t-tests across the levels of a factor, with multiple-comparison p-value adjustment via any stats::p.adjust() method ("holm" by default, or "none"). Adjustment is applied separately within each group stratum when stratified. Returns a survey_pairwise tibble with one row per pair.

Design-based ANOVA

get_anova() computes Rao-Scott design-based ANOVA for survey_glm_fit objects, supporting both Wald and LRT tests with F or Chi-squared reference distributions. Three dispatch branches:
- get_anova(<survey_glm_fit>) — sequential term-by-term anova (matches anova.svyglm() semantics).
- get_anova(<list<survey_glm_fit>>) — chained pairwise comparison across k nested fits, returning k − 1 rows.
- get_anova(<survey_base>, formula = ...) — fits the model internally via survey_glm() and runs sequential anova on the fit; extra ... are forwarded to survey_glm().
  Matches survey::regTermTest() at tolerance 1e-8 on statistics and 1e-6 on p-values.
anova(fit) on a survey_glm_fit now dispatches to get_anova() via a registered S3 method.
plot() on a survey_glm_fit produces a dot-and-whisker coefficient plot with design-based Wald confidence intervals.

Select-all-that-apply (SATA) metadata

set_sata() marks one or more variables on a survey design (or data frame) as select-all-that-apply. Accepts either tidy-select ... or a variable character vector; setting sata = FALSE removes the flag. Idempotent on already-flagged variables.
extract_sata() returns SATA status as a named logical vector (default), a list, or a data frame. fill = FALSE yields a dense view (unmarked variables reported as FALSE); fill = NULL returns only flagged variables.
classify_question_type() classifies a set of requested variables into "single", "sata", or "battery" by grouping them on shared question_preface metadata and honoring per-variable SATA flags. Group numbers are assigned in order of first appearance. Warns when a lone SATA-flagged variable has no preface mate, or when a preface group has mixed SATA flags.

Survey collections

survey_collection is a new S7 container holding an ordered, uniquely-named list of survey_base objects — useful for wave-to-wave analyses, panel studies, or any workflow that compares estimates across multiple designs.
as_survey_collection() constructs a collection from named (wave1 = d1, wave2 = d2) or bare (d1, d2) arguments; duplicate names are repaired by appending _1, _2, … with a warning showing the rename mapping.
add_survey() and remove_survey() return new collections with surveys appended or removed; the original is unchanged.
All nine get_*() analysis functions (get_means(), get_totals(), get_freqs(), get_quantiles(), get_ratios(), get_corr(), get_diffs(), get_t_test(), get_pairwise()) now dispatch over a survey_collection, iterating across surveys and returning a single combined tibble. Two new named-only control args on each function: .id = ".survey" names the identifier column, and .on_missing = c("error", "skip") controls behavior when a requested variable is absent from a survey. Regression functions (survey_glm(), get_anova()) do not support collection dispatch and raise an explicit error pointing users to lapply().

Other improvements

survey_glm() gains a quiet = argument to suppress convergence warnings.
extract_*() metadata functions now accept tidyselect helpers (starts_with(), all_of(), any_of(), matches()) in place of bare name lists.

Bug fixes

get_diffs() now correctly computes pct_change when show_means = FALSE is combined with grouped marginal effects and show_pct_change = TRUE (previously returned NA).

Assets 2

19 Mar 00:22

JDenn0514

v0.6.0

57d234a

surveycore 0.6.0

Breaking changes

survey_srs class and as_survey_srs() constructor have been removed. SRS
designs are now created via as_survey() with no ids or strata — this
produces a survey_taylor with no cluster/strata structure. All estimates are
numerically identical.

New features

get_diffs() estimates treatment effects (differences from a reference group)
via survey-weighted regression. Supports bivariate and multivariate models,
Gaussian and non-Gaussian families, and optional subgroup analysis. Two
estimation paths: direct coefficients for simple models, and
marginaleffects::avg_slopes() / avg_predictions() for models with
covariates or non-Gaussian AMEs. Returns a survey_diffs tibble with optional
mean, pct_change, n_weighted columns, significance stars, and p-value
adjustment. marginaleffects moved from Suggests to Imports.
as_survey() now supports multi-column FPC for multi-stage designs
(e.g., fpc = c(fpc_stage1, fpc_stage2)). Each FPC column corresponds to one
ID stage. Per-stage FPC is validated for NAs, non-positive values, and
within-cluster constancy.
print() for survey_taylor now displays per-stage FPC bullets for
multi-stage designs (e.g., FPC (stage 1): fpc, FPC (stage 2): fpc2).

Bug fixes

SRS variance estimation now uses Taylor (HT) linearization via
.build_cluster_matrices(), correct for any weight structure. Previously used
unweighted sample variance which was incorrect for non-proportional weights.
survey_glm() now correctly indexes weights when na.action = na.omit drops
non-contiguous rows.
get_freqs() now routes survey_nonprob designs through the
Horvitz-Thompson variance path, consistent with the other five analysis
functions.
as_survey_twophase() now accepts survey_replicate and SRS
survey_taylor objects as the phase-1 design (previously restricted to
stratified/clustered survey_taylor only).
as_survey() SRS fallback downgraded from warning to message.

Internal infrastructure

.build_cluster_matrices() extracts multi-stage cluster, strata, and FPC
matrix construction into a shared helper, used across the Taylor variance
engine, analysis cell estimators, and GLM sandwich variance.

Assets 2

12 Mar 11:44

JDenn0514

v0.5.0

95c146d

surveycore v0.5.0

Breaking changes

as_survey_replicate() replaces as_survey_repweights(). The constructor
name now matches the underlying survey_replicate class.
survey_nonprob and as_survey_nonprob() replace survey_calibrated and
as_survey_calibrated(). "Calibrated" implies a post-processing step on a
probability sample; nonprob accurately reflects the design type.
The positional setter form set_var_label(svy, age, "label") has been
removed. Use the named form set_var_label(svy, age = "label") instead.
extract_var_label(), extract_question_preface(), and extract_var_note()
now return a named character vector. extract_var_label(svy, age) now
returns c(age = "Age in years") rather than "Age in years".
extract_val_labels() now returns a named list. extract_val_labels(svy, sex)
now returns list(sex = c(Male = 1L, Female = 2L)) rather than
c(Male = 1L, Female = 2L).
set_variable_labels(), set_value_labels(), set_question_prefaces(), and
set_variable_notes() have been removed. Use set_var_label(),
set_val_labels(), set_question_preface(), and set_var_note()
respectively — all four now accept multiple variables via named ....

New features

set_universe() and extract_universe() set and retrieve universe
(eligibility) annotations for survey variables.
set_missing_codes() and extract_missing_codes() set and retrieve missing
value code vectors for survey variables.
extract_metadata() returns all metadata fields (variable_label,
value_labels, question_preface, note, universe, missing_codes,
transformations) for one or more variables as a named list.

Enhancements

All setter functions now support three call conventions: named ...
(e.g., set_var_label(svy, age = "Age in years")), a single named
vector/list in ..., or explicit variable = / content-argument pairs.
All setters also now work on plain data.frames.
All extractor functions accept multiple variables via ..., support three
output formats ("named_vector", "list", "data_frame"), and accept a
fill argument to include variables with no metadata in the output.

Assets 2

09 Mar 20:05

JDenn0514

v0.4.0

4cab8b9

surveycore v0.4.0

New features

survey_glm() fits survey-weighted generalized linear models for all five
design classes (survey_taylor, survey_replicate, survey_srs,
survey_twophase, survey_calibrated); returns a survey_glm_fit object
with design-based (Binder 1983 sandwich) standard errors and degrees of
freedom.
clean() converts a survey_glm_fit to a tidy survey_glm_tidy tibble
with one row per coefficient, design-based confidence intervals, structured
metadata, and optional reference rows for factor predictors.
survey_glm_fit objects support 20 S3 methods: print(), summary(),
coef(), vcov(), predict(), fitted(), residuals(), confint(),
formula(), terms(), model.matrix(), model.frame(), deviance(),
df.residual(), nobs(), hatvalues(), logLik(), AIC(), BIC(), and
update().
survey_glm_fit integrates with the marginaleffects package; when
marginaleffects is installed, avg_slopes(), avg_predictions(), and the
full marginaleffects API work directly on survey_glm_fit objects.
broom::tidy() is supported for survey_glm_fit objects via a shim that
delegates to clean().
as_survey_rep() has been renamed to as_survey_repweights() to avoid a
namespace clash with the srvyr package.

Bug fixes

as_survey_twophase() variance estimation (method = "approx" and
"full") now uses the correct PSU-level Phase 2 stratum sampling fraction
instead of a row-level fraction, resolving an approximately 2× variance
underestimation.

Assets 2

04 Mar 11:23

JDenn0514

v0.3.3

98f6953

v0.3.3

New features

print() methods for all five survey design classes (survey_taylor, survey_srs, survey_replicate, survey_twophase, survey_calibrated) now display a Domain: <n> of <N> rows line when surveytidy::filter() has been applied. The line appears after the sample size line and before the Groups: line. For two-phase designs, domain counts reflect Phase 2 rows only.

Assets 2

03 Mar 20:02

JDenn0514

v0.3.1

090fc72

v0.3.1

Patch release: vignette proofreading and docs fix.

No user-facing API changes since v0.3.0.

Assets 2

03 Mar 16:42

JDenn0514

v0.3.0

ca4fe9d

surveycore v0.3.0

New features

names() now works on survey design objects, returning the column names of
the underlying data frame. This enables IDE column-name autocomplete in
RStudio and Positron when piping into analysis functions (e.g.,
design |> get_means().

Assets 2

03 Mar 14:19

JDenn0514

v0.2.0

81db027

v0.2.0

New features

get_freqs() computes weighted frequency tables for categorical survey
variables across all five design types, with domain estimation, value-label
support, and AAPOR small-cell warnings.
get_means() returns survey-weighted means with design-correct standard
errors for all five design types, including grouped and domain estimation.
get_totals() returns survey-weighted population totals (and population
size when called without x) for all five design types.
get_corr() computes survey-weighted Pearson correlation using the
delta-method variance approach, with optional group parameter for
per-group correlations and Fisher Z confidence intervals.
get_quantiles() estimates survey-weighted quantiles using the Woodruff
(1952) linearization method; supports multiple probs in a single call and
five CI interval methods.
get_ratios() estimates survey-weighted ratios (numerator total /
denominator total) with design-correct SEs via the delta method (Taylor,
SRS, calibrated, two-phase) or direct per-replicate computation (replicate
designs).
All six analysis functions gain a decimals argument to round numeric
output columns to a fixed number of decimal places.
na.rm = FALSE now includes rows where a grouping variable is NA as a
separate group row in all six analysis functions' output.
infer_question_prefaces() auto-detects shared battery prefaces from
variable labels using separator-based and longest-common-prefix detection.
survey_weighting_history() returns the weighting history stored in a
survey design object's metadata; as_survey(), as_survey_rep(), and
as_survey_srs() now promote "weighting_history" attributes from the
input data frame automatically.
Two-phase variance estimation (as_survey_twophase()) is now fully
supported in get_means() and get_totals(), using the "full",
"approx", and "simple" methods vendored from the survey package.

Bug fixes

get_freqs() no longer crashes when the group variable contains NA
values.
get_freqs() now outputs pct as a proportion (0–1) rather than a
percentage (0–100); se and se_srs are on the same scale.

Assets 2

Releases: JDenn0514/surveycore

surveycore 0.8.0

Breaking changes

New features

survey_collection per-call dispatch defaults

Uniform grouping on survey_collection

Polychoric and polyserial correlation via get_corr(method = ...)

New functions

New warning classes

Uh oh!

v0.7.0

Breaking changes

New functions

Design-based group comparisons

Design-based ANOVA

Select-all-that-apply (SATA) metadata

Survey collections

Other improvements

Bug fixes

Uh oh!

surveycore 0.6.0

Breaking changes

New features

Bug fixes

Internal infrastructure

Uh oh!

surveycore v0.5.0

Breaking changes

New features

Enhancements

Uh oh!

surveycore v0.4.0

New features

Bug fixes

Uh oh!

v0.3.3

New features

Uh oh!

v0.3.1

Uh oh!

surveycore v0.3.0

New features

Uh oh!

v0.2.0

New features

Bug fixes

Uh oh!

`survey_collection` per-call dispatch defaults

Uniform grouping on `survey_collection`

Polychoric and polyserial correlation via `get_corr(method = ...)`