Polkas · Polkas · Feb 26, 2026 · Feb 26, 2026 · Feb 26, 2026
diff --git a/.github/workflows/R-CMD-check.yaml b/.github/workflows/R-CMD-check.yaml
@@ -24,7 +24,6 @@ jobs:
         config:
           - {os: macOS-latest,   r: 'release'}
           - {os: windows-latest, r: 'release'}
-          # Use 3.6 to trigger usage of RTools35
           - {os: ubuntu-latest,   r: 'devel', http-user-agent: 'release'}
 
     env:

diff --git a/.gitignore b/.gitignore
@@ -8,3 +8,7 @@ Meta
 ^\revdep$
 /doc/
 /Meta/
+.DS_Store
+Rplots.pdf
+*.Rcheck/
+inst/doc/
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,11 +1,11 @@
 Package: miceFast
 Title: Fast Imputations Using 'Rcpp' and 'Armadillo'
-Version: 0.9.0
+Version: 0.9.1
 Authors@R: person("Maciej", "Nasinski", email = "nasinski.maciej@gmail.com", role = c("aut", "cre"))
 Description: 
   Fast imputations under the object-oriented programming paradigm. 	
   Moreover there are offered a few functions built to work with popular R packages such as 'data.table' or 'dplyr'.
-  The biggest improvement in time performance could be achieve for a calculation where a grouping variable have to be used.
+  The biggest improvement in time performance can be achieved for a calculation where a grouping variable is used.
   A single evaluation of a quantitative model for the multiple imputations is another major enhancement.
   A new major improvement is one of the fastest predictive mean matching in the R world because of presorting and binary search.
 Depends: R (>= 3.6.0)
@@ -20,7 +20,6 @@ Imports:
 Suggests:
     knitr,
     rmarkdown,
-    pacman,
     testthat,
     mice,
     magrittr,

diff --git a/NEWS.md b/NEWS.md
@@ -1,3 +1,26 @@
+# miceFast 0.9.1
+
+## Bug fixes
+
+* PMM returned predicted values instead of observed values (C++): The `pmm` model returned predicted $\hat{y}$ for missing rows instead of the nearest observed $y$ values. Now it follows Little and Rubin (2002).
+* PMM with character/factor variables (R): `fill_NA_N()` with `model = "pmm"` and a character dependent variable failed because it attempted `as.numeric()` on non-numeric strings, producing all NAs.
+* Character dependent variable with lm models: `fill_NA()` and `fill_NA_N()` with `model = "lm_pred"`, `"lm_bayes"`, or `"lm_noise"` silently returned all NAs when the dependent variable was character with non-numeric labels (e.g., `"apple"`, `"banana"`).
+
+## Documentation
+
+* README: added sequential-chain MI examples (dplyr and data.table) showing how to impute multiple variables and pool with Rubin's rules.
+* Introduction vignette: added full imputation workflow with sequential ordering (impute variables whose predictors are complete first), FCS (chained equations) section with data.table example, and PMM note for the OOP interface.
+* MI vignette: expanded Rubin's rules derivations, added PMM MI example using the OOP interface, expanded "Important caveat" section with OOP and data.table FCS code snippets for non-monotone patterns.
+* Documented PMM as a proper MI method throughout vignettes and README.
+* Improved prose throughout vignettes and README.
+
+## Tests
+
+* Added 20 PMM-specific tests (`test-pmm.R`): observed-value returns, factor/character support, weighted PMM, grouped data.table, reproducibility, stochasticity.
+* Added 31 FCS tests (`test-fcs.R`): data.table, data.frame, and OOP FCS helpers; joint-missingness handling; MI+pool workflow; comparison with `mice` (pooled estimates and imputed means).
+* Added tests for character dependent variables with non-numeric labels across all models and data types.
+* Test suite expanded from 243 to 311 tests.
+
 # miceFast 0.9.0
 
 Kota Hattori, thank you for your feedback and for motivating me for this deep update.

diff --git a/R/fill_NA.R b/R/fill_NA.R
@@ -188,11 +188,7 @@ fill_NA.data.frame <- function(
     f[f > length(l)] <- length(l)
     ff <- factor(l[f])
   } else if (is_character_y) {
-    yy <- if (model != "lda") {
-      factor(yy, levels = sort(as.numeric(unique(yy))))
-    } else {
-      factor(yy)
-    }
+    yy <- factor(yy)
     l <- levels(yy)
     yy <- as.numeric(yy)
     f <- round(fill_NA_(cbind(yy, xx), model, 1, 2:(ncol(xx) + 1), ww, ridge))
@@ -277,11 +273,7 @@ fill_NA.data.table <- function(
     f[f > length(l)] <- length(l)
     ff <- factor(l[f])
   } else if (is_character_y) {
-    yy <- if (model != "lda") {
-      factor(yy, levels = sort(as.numeric(unique(yy))))
-    } else {
-      factor(yy)
-    }
+    yy <- factor(yy)
     l <- levels(yy)
     yy <- as.numeric(yy)
     f <- round(fill_NA_(cbind(yy, xx), model, 1, 2:(ncol(xx) + 1), ww, ridge))

diff --git a/R/fill_NA_N.R b/R/fill_NA_N.R
@@ -17,10 +17,10 @@
 #' @return load imputations in a numeric/character/factor (similar to the input type) vector format
 #'
 #' @note
-#' There is assumed that users add the intercept by their own.
-#' The miceFast module provides the most efficient environment, the second recommended option is to use data.table and the numeric matrix data type.
-#' The lda model is assessed only if there are more than 15 complete observations
-#' and for the lms models if number of independent variables is smaller than number of observations.
+#' It is assumed that users add the intercept column themselves.
+#' The miceFast module provides the most efficient environment; the second recommended option is data.table with a numeric matrix.
+#' Only \code{"lm_bayes"}, \code{"lm_noise"}, and \code{"pmm"} models are supported.
+#' The model is fitted only when the number of complete observations exceeds the number of independent variables.
 #'
 #' @seealso \code{\link{fill_NA}} \code{\link{VIF}}  \code{vignette("miceFast-intro", package = "miceFast")}
 #'
@@ -187,11 +187,7 @@ fill_NA_N.data.frame <- function(
     f[f > length(l)] <- length(l)
     ff <- factor(l[f])
   } else if (is_character_y) {
-    yy <- if (model != "lda") {
-      factor(yy, levels = sort(as.numeric(unique(yy))))
-    } else {
-      factor(yy)
-    }
+    yy <- factor(yy)
     l <- levels(yy)
     yy <- as.numeric(yy)
     f <- round(fill_NA_N_(
@@ -295,11 +291,7 @@ fill_NA_N.data.table <- function(
     f[f > length(l)] <- length(l)
     ff <- factor(l[f])
   } else if (is_character_y) {
-    yy <- if (model != "lda") {
-      factor(yy, levels = sort(as.numeric(unique(yy))))
-    } else {
-      factor(yy)
-    }
+    yy <- factor(yy)
     l <- levels(yy)
     yy <- as.numeric(yy)
     f <- round(fill_NA_N_(

diff --git a/README.md b/README.md
@@ -26,13 +26,23 @@ For performance details, see `performance_validity.R` in the `extdata` folder.
 - [Introduction and Advanced Usage](https://polkas.github.io/miceFast/articles/miceFast-intro.html)
 - [Missing Data Mechanisms and Multiple Imputation](https://polkas.github.io/miceFast/articles/missing-data-and-imputation.html)
 
+
+## Practical Advice
+
+- **Only need a filled-in dataset for exploration or ML?** A single imputation with `fill_NA()` or averaging draws with `fill_NA_N()` is fast and convenient. For any inferential statement use full MI with `pool()`.
+- **Little missing data + MCAR?** Consider using `complete.cases()`. Listwise deletion is unbiased under MCAR and may be sufficient when the fraction of incomplete rows is small.
+- **For publication**, always run a **sensitivity analysis**: compare MI results against base methods (`complete.cases()`, mean imputation) and across different imputation models (`lm_bayes`, `lm_noise`, `pmm`). Vary the number of imputations. If conclusions change, investigate why. Report the imputation model, *m*, and any assumptions about the missing-data mechanism.
+- See the [MI vignette](https://polkas.github.io/miceFast/articles/missing-data-and-imputation.html) for details on MCAR/MAR/MNAR mechanisms and a practical checklist.
+
 ## Multiple Imputation Workflow
 
-[mice](https://cran.r-project.org/package=mice) implements the full MI pipeline (impute, analyze, pool). **miceFast** focuses on the computationally expensive part — fitting the imputation models — and is typically **~10× faster** than mice for the imputation step alone (see [benchmarks](#performance-highlights)). Two usage modes:
+[mice](https://cran.r-project.org/package=mice) implements the full MI pipeline (impute, analyze, pool). **miceFast** focuses on the computationally expensive part: fitting the imputation models. It is typically **~10× faster** than mice for the imputation step alone (see [benchmarks](#performance-highlights)). Two usage modes:
 
-1. **MI with Rubin's rules** — call `fill_NA()` with a stochastic model (`lm_bayes`, `lm_noise`, or `lda` with a random `ridge`) in a loop to create *m* completed datasets, then `pool()` the fitted models.
+1. **MI with Rubin's rules.** Call `fill_NA()` with a stochastic model in a loop to create *m* completed datasets, then `pool()` the fitted models. For continuous variables use `lm_bayes` (strictly **proper**; it draws from the posterior). For both continuous and categorical variables, `pmm` (Predictive Mean Matching) is also **proper**. It draws from the posterior and matches to observed values, preserving the data distribution. Use the OOP interface (`impute("pmm", ...)`) in a loop for MI with PMM. For categorical variables, `lda` with a random `ridge` is **approximate** (ad-hoc perturbation, not a posterior draw, but works well in practice). `lm_noise` is **improper** (no parameter uncertainty); useful for sensitivity checks. See the [MI vignette](https://polkas.github.io/miceFast/articles/missing-data-and-imputation.html).
 
-2. **Single-dataset averaging** — `fill_NA_N()` returns the mean of *k* draws per missing value. Handy for exploration, but not for Rubin's rules (between-imputation variance is lost).
+2. **Single-dataset imputation.** `fill_NA_N()` with `lm_bayes`/`lm_noise` returns the mean of *k* stochastic draws per missing value. With `pmm`, *k* is the number of nearest neighbours to sample from (no averaging). Handy for exploration, but not for Rubin's rules (between-imputation variance is lost).
+
+3. **Iterative FCS (chained equations).** When multiple variables have interlocking (non-monotone) missingness, you can cycle through variables in a loop, restoring and re-imputing each one — the same algorithm mice uses. With a monotone pattern a single pass suffices and FCS is unnecessary. See the [Introduction vignette](https://polkas.github.io/miceFast/articles/miceFast-intro.html) for details.
 
 See the [MI vignette](https://polkas.github.io/miceFast/articles/missing-data-and-imputation.html) for worked examples.
 
@@ -56,36 +66,35 @@ devtools::install_github("polkas/miceFast")
 library(miceFast)
 library(dplyr)
 
-set.seed(1234)
 data(air_miss)
 
 # Visualize the NA structure
 upset_NA(air_miss, 6)
 
-# Model-based single imputation
-air_miss %>%
-  mutate(Ozone_imp = fill_NA(
-    x = ., model = "lm_bayes",
-    posit_y = "Ozone", posit_x = c("Solar.R", "Wind", "Temp")
-  ))
-
-# Proper MI: impute m times, fit models, pool with Rubin's rules
-completed <- lapply(1:5, function(i) {
-  air_miss %>%
-    mutate(Ozone_imp = fill_NA(
-      x = ., model = "lm_bayes",
-      posit_y = "Ozone", posit_x = c("Solar.R", "Wind", "Temp")
-    ))
+# Select the 4 core variables for regression: Ozone ~ Solar.R + Wind + Temp
+# Ozone has 37 NAs, Solar.R has 7 NAs, Wind and Temp are complete.
+df <- air_miss[, c("Ozone", "Solar.R", "Wind", "Temp")]
+
+# MI with Rubin's rules: impute m = 10 datasets, fit model, pool.
+# Impute Solar.R first (predictors fully observed), then Ozone
+# (can now use the freshly imputed Solar.R). This sequential order
+# resolves joint missingness in a single pass.
+set.seed(1234)
+completed <- lapply(1:10, function(i) {
+  df %>%
+    mutate(Solar.R = fill_NA(., "lm_bayes", "Solar.R", c("Wind", "Temp"))) %>%
+    mutate(Ozone   = fill_NA(., "lm_bayes", "Ozone",   c("Solar.R", "Wind", "Temp")))
 })
-fits <- lapply(completed, function(d) lm(Ozone_imp ~ Wind + Temp, data = d))
+fits <- lapply(completed, function(d) lm(Ozone ~ Solar.R + Wind + Temp, data = d))
 pool(fits)
-#> Pooled results from 5 imputed datasets
+#> Pooled results from 10 imputed datasets
 #> Rubin's rules with Barnard-Rubin df adjustment
 #>
-#>         term estimate std.error statistic    df   p.value
-#>  (Intercept)  -62.771   23.9022    -2.626 46.95 1.162e-02
-#>         Wind   -3.087    0.6857    -4.502 37.24 6.420e-05
-#>         Temp    1.736    0.2498     6.951 58.54 3.400e-09
+#>         term  estimate std.error statistic    df   p.value
+#>  (Intercept) -49.50313  21.74948    -2.276 78.41 2.557e-02
+#>      Solar.R   0.05771   0.02294     2.516 72.83 1.407e-02
+#>         Wind  -3.44033   0.62721    -5.485 76.15 5.185e-07
+#>         Temp   1.47603   0.23404     6.307 97.50 8.345e-09
 ```
 
 ### data.table
@@ -94,27 +103,28 @@ pool(fits)
 library(miceFast)
 library(data.table)
 
-set.seed(1234)
 data(air_miss)
-setDT(air_miss)
-
-# Single imputation
-air_miss[, Ozone_imp := fill_NA(
-  x = .SD, model = "lm_bayes",
-  posit_y = "Ozone", posit_x = c("Solar.R", "Wind", "Temp")
-)]
-
-# Grouped imputation — fits a separate model per group
-air_miss[, Solar_R_imp := fill_NA(
-  x = .SD, model = "lm_bayes",
-  posit_y = "Solar.R", posit_x = c("Wind", "Temp", "Intercept")
-), by = .(groups)]
+dt <- as.data.table(air_miss[, c("Ozone", "Solar.R", "Wind", "Temp")])
+
+# MI with Rubin's rules: same sequential chain as above.
+set.seed(1234)
+completed <- lapply(1:10, function(i) {
+  d <- copy(dt)
+  d[, Solar.R := fill_NA(.SD, "lm_bayes", "Solar.R", c("Wind", "Temp"))]
+  d[, Ozone   := fill_NA(.SD, "lm_bayes", "Ozone",   c("Solar.R", "Wind", "Temp"))]
+  d
+})
+fits <- lapply(completed, function(d) lm(Ozone ~ Solar.R + Wind + Temp, data = d))
+pool(fits)
 ```
 
+For iterative FCS (chained equations) with non-monotone missingness,
+see the [Introduction vignette](https://polkas.github.io/miceFast/articles/miceFast-intro.html#iterative-fcs-chained-equations-with-micefast).
+
 ### Naive imputation (baseline only)
 
 ```r
-# Quick baseline — biased, does not account for relationships between variables
+# Quick baseline. Biased; does not account for relationships between variables.
 naive_fill_NA(air_miss)
 ```
 
@@ -127,7 +137,7 @@ See the [Introduction vignette](https://polkas.github.io/miceFast/articles/miceF
 - **Object-Oriented Interface** via `miceFast` objects (Rcpp modules).
 - **Convenient Helpers**:  
   - `fill_NA()`: Single imputation (`lda`, `lm_pred`, `lm_bayes`, `lm_noise`).  
-  - `fill_NA_N()`: Averaged multiple imputations (mean of N draws) (`pmm`, `lm_bayes`, `lm_noise`).  
+  - `fill_NA_N()`: Multiple imputations. Averaged draws for `lm_bayes`/`lm_noise`; nearest-neighbour sampling for `pmm`.  
   - `pool()`: Pool multiply imputed results using Rubin's rules.  
   - `VIF()`: Variance Inflation Factor calculations.  
   - `naive_fill_NA()`: Automatic naive imputations.  
@@ -140,7 +150,7 @@ See the [Introduction vignette](https://polkas.github.io/miceFast/articles/miceF
 |-----------------|-----------------------------------------------------------------------------|
 | `new(miceFast)` | Creates an OOP instance with numerous imputation methods (see the vignette). |
 | `fill_NA()`     | Single imputation: `lda`, `lm_pred`, `lm_bayes`, `lm_noise`.                   |
-| `fill_NA_N()`   | Averaged multiple imputations (mean of N draws): `pmm`, `lm_bayes`, `lm_noise`. |
+| `fill_NA_N()`   | `lm_bayes`/`lm_noise`: averages *k* draws. `pmm`: samples from *k* nearest observed values (works for both continuous and categorical). |
 | `pool()`        | Pools estimates from *m* imputed datasets using Rubin's rules. Works with any model that has `coef()` and `vcov()`. |
 | `VIF()`         | Computes Variance Inflation Factors.                                         |
 | `naive_fill_NA()` | Performs automatic, naive imputations.                                     |
@@ -149,15 +159,6 @@ See the [Introduction vignette](https://polkas.github.io/miceFast/articles/miceF
 
 ---
 
-## Practical Advice
-
-- **Only need a filled-in dataset for exploration or ML?** A single imputation with `fill_NA()` or averaging draws with `fill_NA_N()` is fast and convenient. For any inferential statement use full MI with `pool()`.
-- **Little missing data + MCAR?** Consider using `complete.cases()` — listwise deletion is unbiased under MCAR and may be sufficient when the fraction of incomplete rows is small.
-- **For publication**, always run a **sensitivity analysis**: compare MI results against base methods (`complete.cases()`, mean imputation) and across different imputation models (`lm_bayes`, `lm_noise`, `pmm`). Vary the number of imputations. If conclusions change, investigate why. Report the imputation model, *m*, and any assumptions about the missing-data mechanism.
-- See the [MI vignette](https://polkas.github.io/miceFast/articles/missing-data-and-imputation.html) for details on MCAR/MAR/MNAR mechanisms and a practical checklist.
-
----
-
 ## Performance Highlights
 
 Median timings on 100k rows, 10 variables, 100 groups (R 4.4.3, macOS M3 Pro, [optimized BLAS/LAPACK](https://cran.r-project.org/bin/macosx/RMacOSX-FAQ.html#Which-BLAS-is-used-and-how-can-it-be-changed_003f)):

diff --git a/cran-comments.md b/cran-comments.md
@@ -2,11 +2,9 @@
 
 github actions:
 
-* {os: macOS-latest,   r: 'release'}
-* {os: windows-latest, r: 'release'}
-* {os: windows-latest, r: '3.6'}
-* {os: ubuntu-18.04,   r: 'devel', http-user-agent: 'release'}
-* {os: ubuntu-18.04,   r: 'release'}
+- {os: macOS-latest,   r: 'release'}
+- {os: windows-latest, r: 'release'}
+- {os: ubuntu-latest,   r: 'devel', http-user-agent: 'release'}
 
 and:
 

diff --git a/man/fill_NA_N.Rd b/man/fill_NA_N.Rd