| title | AB Test Research Designer |
|---|---|
| emoji | 🧪 |
| colorFrom | blue |
| colorTo | green |
| sdk | docker |
| app_port | 8008 |
| pinned | false |
| license | mit |
Local-first experiment planning tool for A/B and multi-variant tests. Plan sample size and duration from the wizard, review deterministic statistical guidance (SRM, Bayesian, group-sequential, CUPED) plus design-time guardrail-metric recommendations, compare saved experiments side by side, and export decision-ready reports in seven languages (English, Russian, German, Spanish, French, Simplified Chinese, Arabic with RTL) — all against a local SQLite workspace with no cloud required.
Built with FastAPI + React 19 + TypeScript + Vite + SQLite, verified end-to-end via scripts/verify_all.cmd --with-e2e (350+ backend tests, 200+ frontend tests, Playwright E2E, Lighthouse CI, axe accessibility checks). Backend coverage gated at 89%+ in CI.
It combines:
- deterministic sample size and duration calculation
- heuristic warnings and feasibility checks
- deterministic experiment design output
- optional local LLM recommendations
- SQLite-backed project storage with history and export metadata
- lightweight runtime diagnostics plus request-id / process-time headers
- baseline security headers, API rate limiting, auth-failure throttling, and request-body size guards
- SQLite schema versioning plus configurable WAL/busy-timeout runtime settings
- optional API token protection for runtime and project APIs
- workspace backup and restore for saved projects plus history, integrity counts, checksums, and optional HMAC signatures
- preflight workspace validation before import, plus runtime SQLite write-probe diagnostics
Live demo: https://liovina-ab-test-research-designer.hf.space (hosted on Hugging Face Spaces, free CPU tier — first cold request may take a few seconds)
The hosted demo is seeded with four sample projects (checkout conversion, pricing sensitivity, onboarding completion, and feed ad click-through ratio), each with a completed analysis run and seeded live-experiment data (plus an export on the first one), so the sidebar and history views are populated on first load. For Hugging Face Spaces, set AB_SEED_DEMO_ON_STARTUP=true in Space Settings.
For persistent hosted state on Hugging Face Spaces, also set:
AB_HF_SNAPSHOT_REPOto the private dataset repo id, for exampleliovina/ab-test-designer-snapshotsAB_HF_TOKENto a Hugging Face token with dataset write access, stored only as a Space SecretAB_HF_SNAPSHOT_INTERVAL_SECONDSto control periodic snapshot uploads; default is900, and0disables the background loop
With those variables configured, the backend restores the latest projects.sqlite3 snapshot on startup, still runs the idempotent demo seed afterwards (so seeded live-experiment data survives a restore that predates it), uploads periodic dataset snapshots while the Space is running, and attempts one final push during shutdown.
Deploy your own: see docs/DEPLOY.md. Release prep files: fly.toml and docs/RELEASE_NOTES_v1.1.0.md.
Sample import payload:
Current workflow screenshots are generated by the smoke script into docs/demo/.
The smoke flow seeds saved demo projects, loads the onboarding example in the wizard,
runs analysis, captures comparison and webhook views, and exports a report:
The screenshots follow the real v1.1.0 path through the product: wizard overview, review step, and the post-analysis results dashboard.
They then switch to saved-project comparison to show the multi-project power-curve and forest-plot dashboard with seeded snapshots.
The final image shows the admin-side webhook manager with a seeded Slack-style subscription in the sidebar tools area. The Slack App flow adds OAuth installation and /ab-test commands alongside the older one-way webhook path.
Retailer testing two checkout variants against control to lift conversion from a 4.2% baseline.
Setup - 80k daily visitors, 50% share into test, 3 variants (34/33/33), alpha = 0.05, power = 0.80, two-sided, relative MDE = 10%.
Sizing (from POST /api/v1/calculate).
| Metric | Value |
|---|---|
| Per-variant sample | 45,429 users |
| Total sample | 136,287 users |
| Required duration | 4 days |
| Bonferroni adjustment | 2 treatment-vs-control comparisons, adjusted alpha 0.025 |
Design guidance (from POST /api/v1/design).
- Primary risk: More than two variants trigger a Bonferroni alpha correction. This is conservative and may overstate the required sample size.
- Key recommendation: Validate tracking and assignment before exposing live traffic.
- Guardrail to monitor: Payment error rate
Interim check. An early snapshot came in after 1.2 test-days, 48,000 visitors, and 3,812 conversions (35.2% of the planned per-variant sample):
- P(variant A > control) = 93.4%
- P(variant B > control) = 99.8% Variant A is still ambiguous; variant B is the only treatment with a decisive early signal.
Decision. Stop spending exposure on variant A, keep variant B against control until the planned read is complete, and ship B only if payment error rate and refund value stay in range. The value here is that sizing, multivariant correction, design risks, and the Bayesian interim view all come from the same backend run.
Full inputs and outputs: docs/case-studies/checkout-redesign.json. Rerun with python scripts/generate_case_study_numbers.py.
Post-v1.1.0 Tier 2/3 roadmap items are all landed as of 2026-04-25.
Landed:
- Portfolio polish. HF Space startup seed, v1.1.0 screenshots, case-study section, GHCR Docker publish, dynamic shields.io badges.
- Product quality. Locale parity at 940 leaf keys across all shipped UI locales (en/ru/de/es/fr/zh/ar — including the Slack-App admin block), HF Dataset SQLite snapshot service, optional OpenAI/Anthropic adapter via browser-session token, Astro Starlight docs site at brownjuly2003-code.github.io/ab-test-research-designer, 10-template industry gallery.
- Hardening. Monte-Carlo distribution overlay with interactive probability slider, French / Simplified-Chinese / Arabic locales (+RTL for Arabic), extended Hypothesis property coverage (numerical stability + Bayesian edges + Monte-Carlo determinism), bundle optimization (main chunk 247 → 122 KB gzip via lazy-load locales + vendor chunks), optional Postgres backend via
AB_DATABASE_URLwith CI matrix coverage, Slack App integration with OAuth install + slash commands + interactive actions.
Dropped as out-of-scope for a portfolio/demo: manual NVDA / JAWS audit (automated axe a11y coverage sufficient here).
- Frontend: React 19 + TypeScript + Vite
- Backend: FastAPI + Pydantic
- Storage: SQLite
- Optional AI path: local orchestrator adapter with retry/backoff
- Verification: backend tests, frontend unit tests, typecheck, build, smoke, Playwright E2E
- CI: .github/workflows/test.yml
- Container image published to GHCR on each tag (
linux/amd64,linux/arm64) - canonical cross-platform verification entrypoint: verify_all.py and verify_all.cmd
- wizard-based experiment input with review step
- deterministic calculations for binary and continuous metrics
- Bonferroni-aware multivariant sizing notes
- warning engine for traffic, duration, seasonality, campaigns, and design quality
- deterministic report with design, metrics plan, risks, and recommendations
- optional AI advice kept separate from the hard-math output
- optional OpenAI and Anthropic adapters via browser-session token, without backend key persistence
- local project save, load, update, archive, restore, compare, history, and export flows
- saved-project revision history with payload restore into the wizard
- richer snapshot comparison with assumption/risk overlap and recommendation highlights
- full workspace export/import for project, analysis, export-history, and revision backup
- workspace import preflight validation with checksum/reference verification before writes begin
- browser draft restore/autosave plus JSON draft import/export
- workspace status board summarizing saved-project coverage, snapshot depth, exports, and current draft sync state
- read-only aware frontend mode that disables write actions when the API session only has safe GET access
Prerequisites:
- Python 3.11+
- Node.js LTS
- Git
Environment template:
- start from .env.example
- set
AB_API_TOKENif you want write-capable/api/v1/*routes protected - optionally set
AB_READONLY_API_TOKENfor read-only access to diagnostics, readiness, docs, andGETproject routes - optionally set
AB_WORKSPACE_SIGNING_KEYto HMAC-sign exported workspace backups and require signed imports on that runtime - optionally set
AB_HF_SNAPSHOT_REPOandAB_HF_TOKENto restore/persist the SQLite workspace through a private Hugging Face Dataset snapshot - optionally set
AB_HF_SNAPSHOT_INTERVAL_SECONDSto change the snapshot cadence; the default is900, and0disables the background snapshot task - rate limiting and auth-failure throttling are enabled by default; tune
AB_RATE_LIMIT_*andAB_AUTH_FAILURE_*for stricter or looser local behavior - request body guards are enabled by default; tune
AB_MAX_REQUEST_BODY_BYTESandAB_MAX_WORKSPACE_BODY_BYTESif you expect unusually large workspace bundles - when the backend is protected, paste the token into the frontend "API session token" field; it stays only in the current browser session and is not baked into the build
cd app/backend
python -m pip install -r requirements.txt
cd ../.. # back to repo root
python -m uvicorn app.backend.app.main:app --host 127.0.0.1 --port 8008Health:
http://127.0.0.1:8008/health
Diagnostics:
http://127.0.0.1:8008/api/v1/diagnostics
Readiness:
http://127.0.0.1:8008/readyz
cd app/frontend
npm install
npm run devVite default:
http://127.0.0.1:5173
The runtime now supports two auth modes for external consumers:
- legacy shared tokens via
AB_API_TOKENandAB_READONLY_API_TOKEN - managed database-backed API keys created with
AB_ADMIN_TOKEN
FastAPI documentation pages stay public:
- Swagger UI:
http://127.0.0.1:8008/docs - Redoc:
http://127.0.0.1:8008/redoc - OpenAPI JSON:
http://127.0.0.1:8008/openapi.json
Create a scoped key once AB_ADMIN_TOKEN is configured:
curl -X POST http://127.0.0.1:8008/api/v1/keys \
-H "Authorization: Bearer YOUR_AB_ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name":"Partner read key","scope":"read","rate_limit_requests":60,"rate_limit_window_seconds":60}'Use the returned plaintext secret against protected routes:
curl http://127.0.0.1:8008/api/v1/projects \
-H "X-API-Key: abk_your_plaintext_key"Only the hash is stored in SQLite, and the plaintext key is shown once at creation time. Legacy shared tokens remain available for backward compatibility and should be documented to external consumers as legacy access.
Configure an outbound webhook for audit events:
curl -X POST http://127.0.0.1:8008/api/v1/webhooks \
-H "Authorization: Bearer YOUR_AB_ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name":"Slack alerts","target_url":"https://hooks.slack.com/services/XXX/YYY/ZZZ","secret":"rotate-me","format":"slack","event_filter":["api_key_created","api_key_revoked","analysis_run_created","workspace_imported","project.archive"],"scope":"global"}'Fire a test delivery:
curl -X POST http://127.0.0.1:8008/api/v1/webhooks/WEBHOOK_ID/test \
-H "Authorization: Bearer YOUR_AB_ADMIN_TOKEN"Generic endpoints receive JSON plus X-AB-Signature: sha256=...; Slack subscriptions receive an incoming-webhook payload without signature validation.
For the two-way Slack App, create an app from slack/app-manifest.yml, set AB_SLACK_CLIENT_ID, AB_SLACK_CLIENT_SECRET, and AB_SLACK_SIGNING_SECRET, then open /slack/install. The app exposes /ab-test projects, /ab-test status <project_id>, and /ab-test run <project_id>.
The UI ships with seven locales: English (default), Russian, German, Spanish, French, Simplified Chinese, and Arabic. Pick a language from the header switcher (the choice persists to localStorage under ab-test:language) or set ?lang=fr on the URL to override auto-detection. Arabic also switches the document into dir="rtl" so the shell, panels, toasts, and warning callouts follow the reading direction automatically.
The backend honors the Accept-Language header on export endpoints and localizes the markdown/HTML report headers plus warning and risk strings. Regional tags fall back to their primary language: fr-CA -> fr, de-AT -> de, es-MX -> es, zh-CN / zh-TW -> zh, ar-SA / ar-EG -> ar, and unsupported locales fall back to en.
curl -X POST http://127.0.0.1:8008/api/v1/export/markdown \
-H "Accept-Language: de" \
-H "Content-Type: application/json" \
-d @docs/demo/sample-report.jsonUnsupported locales fall back to English. For instructions on adding another locale, see docs/RUNBOOK.md#adding-a-new-locale.
Build and run the full stack through the backend-served frontend:
docker compose up --buildSecure local container mode:
set AB_API_TOKEN=your-secret-token
docker compose up --buildDual-token container mode:
set AB_API_TOKEN=write-secret-token
set AB_READONLY_API_TOKEN=readonly-secret-token
docker compose up --buildSigned-backup container mode:
set AB_WORKSPACE_SIGNING_KEY=replace-with-a-long-random-secret
docker compose up --buildSecure Docker verification:
cmd /c scripts\verify_all.cmd --with-dockerNon-destructive Docker verification:
python scripts/verify_docker_compose.py --preserveImage publish, registry tagging, rollback, and runtime verification details: docs/DEPLOY.md
Then open:
http://127.0.0.1:8008
Full local pipeline:
cmd /c scripts\verify_all.cmdUseful variants:
cmd /c scripts\verify_all.cmd --skip-smokecmd /c scripts\verify_all.cmd --skip-buildcmd /c scripts\verify_all.cmd --with-e2ecmd /c scripts\verify_all.cmd --with-e2e --with-lighthousecmd /c scripts\verify_all.cmd --with-dockercmd /c scripts\verify_all.cmd --with-docker-preserve
The verify pipeline exercises both checksum-only and signed workspace backup roundtrips. It also covers rate limiting, auth-throttle, request-size enforcement, and workspace checksum/signature regressions through backend tests.
Workspace backup roundtrip drill:
python scripts/verify_workspace_backup.py --fixtureSigned workspace backup roundtrip drill:
set AB_WORKSPACE_SIGNING_KEY=replace-with-a-long-random-secret
python scripts/verify_workspace_backup.py --fixtureBackend calculation benchmark:
python scripts/benchmark_backend.py --payload binary --assert-ms 100The backend pytest suite also includes an in-repo p95 latency guard for binary and continuous calculations.
Browser E2E:
cd app/frontend
npm run test:e2eThis command builds the frontend if needed and runs Playwright against a temporary backend-served build on a free local port.
Build the frontend, start the backend-served dist on port 4174, and run Lighthouse CI:
npm --prefix app/frontend run build
python scripts/run_lighthouse_ci.pyTo include Lighthouse in the full local verification flow:
cmd /c scripts\verify_all.cmd --with-e2e --with-lighthouseCurrent Lighthouse thresholds stay strict for accessibility and advisory for other categories:
- performance
>= 0.85(warn) - accessibility
>= 0.90(error) - best-practices
>= 0.90(warn) - seo
>= 0.80(warn)
Active docs:
- docs/HISTORY.md
- docs/ARCHITECTURE.md
- docs/API.md
- docs/RULES.md
- docs/RUNBOOK.md
- docs/RELEASE_CHECKLIST.md
- CHANGELOG.md
- frontend API contracts are generated from FastAPI OpenAPI into
app/frontend/src/lib/generated/api-contract.ts - TypeScript strict mode is enabled
- pytest cache artifacts are disabled via
pytest.ini - the smoke script updates
docs/demo/screenshots from a real browser flow - the smoke flow now verifies the sample import payload before refreshing screenshots
- the Playwright E2E command builds the frontend if needed, starts a temporary backend-served frontend on a free local port, and cleans it up through
scripts/run_frontend_e2e.py - LLM adapter timeout/retry behavior can be tuned through
.env.example - SQLite busy timeout, journal mode, synchronous mode, and backend log format are configurable through
.env.example - optional write-token auth is available through
AB_API_TOKEN; the frontend can send it as a browser-session token without baking it into the build - optional read-only auth is available through
AB_READONLY_API_TOKENfor safeGET/HEAD/OPTIONSruntime access - API responses now include
X-Request-IDandX-Process-Time-Msheaders for lightweight local observability - responses now also include baseline security headers (
Content-Security-Policy,X-Content-Type-Options,X-Frame-Options,Referrer-Policy,Permissions-Policy) /api/v1/*requests now have configurable in-memory rate limiting plus a dedicated auth-failure throttle withRetry-Afteron429- mutating API routes now enforce configurable request-body limits, with a larger dedicated ceiling for workspace import/validate flows
- error responses now also include
error_code,status_code,request_id, andX-Error-Code GET /readyzgives a simple readiness view over storage, frontend-dist serving, and runtime configGET /api/v1/diagnosticsnow also exposes in-memory runtime counters plus the active guardrail configuration for security headers, rate limiting, auth throttling, and request-body limits- workspace backup/import now works from the UI and through
GET /api/v1/workspace/exportplusPOST /api/v1/workspace/import - workspace backup bundles now include integrity counts and a SHA-256 checksum; when
AB_WORKSPACE_SIGNING_KEYis configured they also carry an HMAC signature and imports require signature verification on that runtime - saved projects now retain revision history and can restore older payload snapshots from the UI




