Skip to content

Commit 91ba5a4

Browse files
committed
Build survey_tools and automated_reporting modules
- sample_size_calculator.R, sampling_weights.R, survey_summary.R - render_reports.R batch renderer, monthly_summary.qmd Quarto template - Updated README with detailed module documentation - Updated CHANGELOG
1 parent 7d6dae9 commit 91ba5a4

9 files changed

Lines changed: 491 additions & 8 deletions

File tree

CHANGELOG.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,26 @@
11
# Changelog
22

3+
## [v1.2.0] - 2025-04-28
4+
5+
### Added
6+
- `survey_tools/` module with 3 R scripts:
7+
- `sample_size_calculator.R` — simple, stratified, and cluster sampling with design effect estimation
8+
- `sampling_weights.R` — base weight calculation, trimming, and weighted summary statistics
9+
- `survey_summary.R` — complex survey design objects with weighted descriptives and proportions
10+
- `automated_reporting/` module with 2 files:
11+
- `render_reports.R` — batch rendering for Quarto and RMarkdown with output logging
12+
- `monthly_summary.qmd` — Quarto template for monthly indicator summary with inline visualizations
13+
14+
### Improved
15+
- README updated with detailed survey_tools and automated_reporting documentation
16+
- Survey tools and automated reporting sections expanded from stubs to full descriptions
17+
18+
## [v1.1.0] - 2025-04-24
19+
20+
### Improved
21+
- Full repo cleanup: removed 14 placeholder files, updated docs, added .github config
22+
- README rewritten with honest content listing and ecosystem links
23+
324
## [v1.0.0](https://github.com/Varnasr/FieldStack/tree/v1.0.0) (2025-04-19)
425

526
[Full Changelog](https://github.com/Varnasr/FieldStack/compare/954b918bc01299272877fe2d2b65194fcf7a7eed...v1.0.0)

README.md

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,13 +38,26 @@ This is the **applied research layer** of [OpenStacks for Change](https://openst
3838
| `codebook_templates/` | Variable metadata for health surveys and programme monitoring |
3939
| `tests/` | 9 testthat unit tests covering all core functions |
4040

41+
### Survey Tools
42+
43+
| Script | What It Does |
44+
|--------|-------------|
45+
| `survey_tools/sample_size_calculator.R` | Simple, stratified, and cluster sampling calculations with design effect |
46+
| `survey_tools/sampling_weights.R` | Base weight calculation, trimming, weighted summary statistics |
47+
| `survey_tools/survey_summary.R` | Survey design objects, weighted descriptives, proportions using the `survey` package |
48+
49+
### Automated Reporting
50+
51+
| File | What It Does |
52+
|------|-------------|
53+
| `automated_reporting/render_reports.R` | Batch Quarto/RMarkdown rendering with logging |
54+
| `automated_reporting/monthly_summary.qmd` | Quarto template for monthly indicator summary with inline plots |
55+
4156
### Supporting
4257

4358
| Directory | What It Contains |
4459
|-----------|-----------------|
4560
| `python_integration/` | R-Python interop via reticulate |
46-
| `survey_tools/` | Survey data utilities |
47-
| `automated_reporting/` | Report generation workflows |
4861

4962
## Getting Started
5063

automated_reporting/README.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,17 @@
11
# Automated Reporting
22

3-
This folder contains example scripts and templates for generating automated reports using Quarto and R.
4-
You can batch-render `.qmd` files to PDF/HTML using the `rmarkdown::render()` or `quarto::quarto_render()` commands.
3+
R scripts and Quarto templates for generating automated field reports.
4+
5+
## Contents
6+
7+
| File | Purpose |
8+
|------|---------|
9+
| `render_reports.R` | Batch render Quarto notebooks to HTML/PDF |
10+
| `monthly_summary.qmd` | Template for monthly indicator summary report |
11+
12+
## Usage
513

6-
Example:
714
```r
8-
quarto::quarto_render("monthly_summary.qmd", output_format = "html")
9-
```
15+
source("automated_reporting/render_reports.R")
16+
render_all_notebooks("../notebooks/")
17+
```
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
---
2+
title: "Monthly Indicator Summary"
3+
format: html
4+
date: today
5+
params:
6+
data_path: "../sample_data/mel_indicators_wide.csv"
7+
---
8+
9+
```{r setup, include=FALSE}
10+
library(tidyverse)
11+
knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE)
12+
```
13+
14+
## Data Overview
15+
16+
```{r load-data}
17+
df <- read_csv(params$data_path, show_col_types = FALSE)
18+
cat(sprintf("Records: %d | Variables: %d\n", nrow(df), ncol(df)))
19+
```
20+
21+
## Summary Statistics
22+
23+
```{r summary}
24+
df %>%
25+
select(where(is.numeric)) %>%
26+
pivot_longer(everything(), names_to = "indicator", values_to = "value") %>%
27+
group_by(indicator) %>%
28+
summarise(
29+
n = sum(!is.na(value)),
30+
mean = round(mean(value, na.rm = TRUE), 1),
31+
median = round(median(value, na.rm = TRUE), 1),
32+
min = round(min(value, na.rm = TRUE), 1),
33+
max = round(max(value, na.rm = TRUE), 1),
34+
.groups = "drop"
35+
) %>%
36+
knitr::kable()
37+
```
38+
39+
## Distribution
40+
41+
```{r plot, fig.width=8, fig.height=4}
42+
df %>%
43+
select(where(is.numeric)) %>%
44+
pivot_longer(everything(), names_to = "indicator", values_to = "value") %>%
45+
ggplot(aes(x = value)) +
46+
geom_histogram(bins = 20, fill = "#1a56db", alpha = 0.7) +
47+
facet_wrap(~indicator, scales = "free") +
48+
theme_minimal() +
49+
labs(title = "Indicator Distributions", x = NULL, y = "Count")
50+
```
51+
52+
---
53+
54+
*Generated automatically by FieldStack automated reporting.*
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# Automated Report Rendering
2+
# Batch render Quarto/R Markdown notebooks to HTML or PDF
3+
4+
library(tidyverse)
5+
6+
#' Render a single Quarto document
7+
#' @param input Path to .qmd file
8+
#' @param output_format "html" or "pdf"
9+
#' @param output_dir Output directory (default: same as input)
10+
render_notebook <- function(input, output_format = "html", output_dir = NULL) {
11+
if (!file.exists(input)) {
12+
warning(paste("File not found:", input))
13+
return(invisible(NULL))
14+
}
15+
16+
if (requireNamespace("quarto", quietly = TRUE)) {
17+
quarto::quarto_render(input, output_format = output_format, output_file = output_dir)
18+
} else if (requireNamespace("rmarkdown", quietly = TRUE)) {
19+
rmarkdown::render(input, output_format = paste0(output_format, "_document"),
20+
output_dir = output_dir)
21+
} else {
22+
stop("Install quarto or rmarkdown package")
23+
}
24+
cat(sprintf("Rendered: %s -> %s\n", input, output_format))
25+
}
26+
27+
#' Batch render all .qmd files in a directory
28+
#' @param dir Directory to search for .qmd files
29+
#' @param output_format "html" or "pdf"
30+
#' @param output_dir Output directory for rendered files
31+
#' @param recursive Search subdirectories
32+
render_all_notebooks <- function(dir = ".", output_format = "html",
33+
output_dir = NULL, recursive = FALSE) {
34+
files <- list.files(dir, pattern = "\\.qmd$", full.names = TRUE, recursive = recursive)
35+
36+
if (length(files) == 0) {
37+
cat("No .qmd files found in", dir, "\n")
38+
return(invisible(NULL))
39+
}
40+
41+
cat(sprintf("Found %d notebook(s) to render:\n", length(files)))
42+
results <- lapply(files, function(f) {
43+
tryCatch({
44+
render_notebook(f, output_format, output_dir)
45+
data.frame(file = f, status = "success", stringsAsFactors = FALSE)
46+
}, error = function(e) {
47+
cat(sprintf(" FAILED: %s (%s)\n", f, e$message))
48+
data.frame(file = f, status = paste("failed:", e$message), stringsAsFactors = FALSE)
49+
})
50+
})
51+
52+
bind_rows(results)
53+
}
54+
55+
#' Generate a summary table from rendered reports
56+
#' @param results Output from render_all_notebooks
57+
#' @return Summary data frame
58+
report_summary <- function(results) {
59+
results %>%
60+
mutate(
61+
basename = basename(file),
62+
rendered = status == "success"
63+
) %>%
64+
summarise(
65+
total = n(),
66+
rendered = sum(rendered),
67+
failed = sum(!rendered)
68+
)
69+
}
70+
71+
# Example usage
72+
if (sys.nframe() == 0) {
73+
cat("=== Automated Report Renderer ===\n")
74+
cat("Usage:\n")
75+
cat(" source('automated_reporting/render_reports.R')\n")
76+
cat(" render_all_notebooks('../notebooks/', output_format = 'html')\n")
77+
cat(" render_notebook('path/to/notebook.qmd', 'pdf')\n")
78+
}

survey_tools/README.md

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,24 @@
11
# Survey Tools
22

3-
Survey-weighted summaries and wrappers using `srvyr`, `survey`, and related packages.
3+
R functions for survey design and analysis using the `survey` and `srvyr` packages.
4+
5+
## Contents
6+
7+
| Script | Purpose |
8+
|--------|---------|
9+
| `sample_size_calculator.R` | Calculate sample sizes for simple, stratified, and cluster designs |
10+
| `sampling_weights.R` | Compute and apply survey weights for complex designs |
11+
| `survey_summary.R` | Weighted descriptive statistics with confidence intervals |
12+
13+
## Usage
14+
15+
```r
16+
source("survey_tools/sample_size_calculator.R")
17+
sample_size_simple(p = 0.5, margin = 0.05, confidence = 0.95)
18+
```
19+
20+
## Requirements
21+
22+
- R 4.0+
23+
- `tidyverse`
24+
- `survey` (for `survey_summary.R`)
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Sample Size Calculator for Survey Design
2+
# Covers simple random, stratified, and cluster sampling designs
3+
4+
#' Calculate sample size for simple random sampling
5+
#' @param p Expected proportion (default 0.5 for maximum variance)
6+
#' @param margin Margin of error (default 0.05)
7+
#' @param confidence Confidence level (default 0.95)
8+
#' @param population Population size (NULL for infinite)
9+
#' @return Required sample size
10+
sample_size_simple <- function(p = 0.5, margin = 0.05, confidence = 0.95, population = NULL) {
11+
z <- qnorm(1 - (1 - confidence) / 2)
12+
n <- (z^2 * p * (1 - p)) / margin^2
13+
14+
# Finite population correction
15+
if (!is.null(population)) {
16+
n <- n / (1 + (n - 1) / population)
17+
}
18+
ceiling(n)
19+
}
20+
21+
#' Calculate sample size for stratified sampling
22+
#' @param strata_sizes Vector of stratum population sizes
23+
#' @param strata_proportions Expected proportion per stratum
24+
#' @param margin Margin of error
25+
#' @param confidence Confidence level
26+
#' @param allocation Allocation method: "proportional" or "equal"
27+
#' @return Data frame with stratum-wise sample sizes
28+
sample_size_stratified <- function(strata_sizes, strata_proportions = NULL,
29+
margin = 0.05, confidence = 0.95,
30+
allocation = "proportional") {
31+
k <- length(strata_sizes)
32+
if (is.null(strata_proportions)) strata_proportions <- rep(0.5, k)
33+
34+
N <- sum(strata_sizes)
35+
z <- qnorm(1 - (1 - confidence) / 2)
36+
37+
# Total sample size
38+
n_total <- sample_size_simple(p = 0.5, margin = margin, confidence = confidence, population = N)
39+
40+
# Allocate across strata
41+
if (allocation == "proportional") {
42+
weights <- strata_sizes / N
43+
} else {
44+
weights <- rep(1 / k, k)
45+
}
46+
47+
n_strata <- ceiling(n_total * weights)
48+
49+
data.frame(
50+
stratum = seq_len(k),
51+
population = strata_sizes,
52+
proportion = strata_proportions,
53+
sample_size = n_strata
54+
)
55+
}
56+
57+
#' Calculate sample size for cluster randomized designs
58+
#' @param icc Intra-cluster correlation coefficient
59+
#' @param cluster_size Average number of units per cluster
60+
#' @param p Expected proportion
61+
#' @param margin Margin of error
62+
#' @param confidence Confidence level
63+
#' @return List with design effect, effective sample size, and clusters needed
64+
sample_size_cluster <- function(icc = 0.05, cluster_size = 30,
65+
p = 0.5, margin = 0.05, confidence = 0.95) {
66+
# Design effect
67+
deff <- 1 + (cluster_size - 1) * icc
68+
69+
# Simple sample size
70+
n_simple <- sample_size_simple(p = p, margin = margin, confidence = confidence)
71+
72+
# Adjusted for clustering
73+
n_effective <- ceiling(n_simple * deff)
74+
n_clusters <- ceiling(n_effective / cluster_size)
75+
76+
list(
77+
design_effect = round(deff, 2),
78+
simple_sample_size = n_simple,
79+
effective_sample_size = n_effective,
80+
clusters_needed = n_clusters,
81+
total_sample = n_clusters * cluster_size,
82+
icc = icc,
83+
cluster_size = cluster_size
84+
)
85+
}
86+
87+
# Example usage
88+
if (sys.nframe() == 0) {
89+
cat("=== Simple Random Sampling ===\n")
90+
cat(sprintf("50%% proportion, 5%% margin: n = %d\n", sample_size_simple()))
91+
cat(sprintf("30%% proportion, 3%% margin: n = %d\n", sample_size_simple(p = 0.3, margin = 0.03)))
92+
cat(sprintf("With population 10000: n = %d\n", sample_size_simple(population = 10000)))
93+
94+
cat("\n=== Stratified Sampling ===\n")
95+
strat <- sample_size_stratified(
96+
strata_sizes = c(5000, 3000, 2000),
97+
allocation = "proportional"
98+
)
99+
print(strat)
100+
101+
cat("\n=== Cluster Sampling ===\n")
102+
cluster <- sample_size_cluster(icc = 0.05, cluster_size = 30)
103+
cat(sprintf("Design effect: %.2f\n", cluster$design_effect))
104+
cat(sprintf("Clusters needed: %d\n", cluster$clusters_needed))
105+
cat(sprintf("Total sample: %d (vs %d simple)\n", cluster$total_sample, cluster$simple_sample_size))
106+
}

0 commit comments

Comments
 (0)