Educational research project exploring distributed measurement infrastructure for algorithmic auditing.
This is a personal project developed independently to explore causal inference methodology, distributed browser automation, and measurement system design. Not affiliated with or endorsed by my employer.
This project implements a distributed measurement platform for auditing ad delivery systems. It measures whether ad delivery algorithms show different results to users based on demographic proxies (ZIP code / household identity) while holding browsing behavior constant.
Research Question: If two users browse identically but have different demographic proxies, do they receive different ad deliveries?
Primary Use Case: Educational exploration of:
- Causal inference experimental design
- Distributed browser automation at scale
- Memory-safe concurrency patterns
- Anti-detection engineering for measurement integrity
- Statistical analysis pipelines
Note: This is research-grade code intended for learning and experimentation. It is tested and documented, but designed for measurement workflows rather than production ad-serving environments.
┌─────────────────────────────────────────────────────────────────┐
│ Controller (MCP Server) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Task Queue │ │ Agent │ │ Results │ │
│ │ │ │ Scheduler │ │ Collector │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Proxy Identity │ │ Proxy Identity │ │ Proxy Identity │
│ (ZIP A) │ │ (ZIP B) │ │ (ZIP C) │
│ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │
│ │ Browser │ │ │ │ Browser │ │ │ │ Browser │ │
│ │ Session │ │ │ │ Session │ │ │ │ Session │ │
│ └───────────┘ │ │ └───────────┘ │ │ └───────────┘ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
└────────────────────┼────────────────────┘
▼
┌─────────────────────┐
│ Measurement DB │
│ (SQLite/Postgres) │
└─────────────────────┘
│
▼
┌─────────────────────┐
│ Analysis Pipeline │
│ (Statistical) │
└─────────────────────┘
| Challenge | Solution | Engineering Significance |
|---|---|---|
| Causal Inference | Paired trial design (identical behavior, varied identity) | Isolates identity as the only variable for valid causal claims |
| Bot Detection | Fingerprint randomization (UA, viewport, navigator properties) | Maintains measurement integrity by avoiding detection systems |
| Memory Safety | Semaphore-based concurrency control | Prevents host exhaustion during parallel browser execution |
| Identity Isolation | Proxy rotation with clean sessions | Ensures no cross-contamination between experimental trials |
| Reproducibility | Containerized runtime (Docker) | Enables reproducible results across different environments |
- Python 3.10+
- Docker (optional, for reproducible runs)
- Residential proxies (optional, for identity variation)
# Clone and setup
git clone https://github.com/dfcheckmate/ad_research_experiment.git
cd ad-research-experiment
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
playwright install chromium
# Configure (copy example and edit)
cp .env.example .env
# Run small experiment
bash quickstart.sh
python src/analysis.py --output out/results/# Build image
docker build -t ad-research-experiment:latest .
# Run experiment
mkdir -p out
docker run --rm \
--shm-size=1gb \
-v "$PWD/out:/out" \
--env-file .env \
ad-research-experiment:latest \
src/experiment.py --trials 10 --concurrency 2After running 200 trials across 3 proxy identities:
| Metric | ZIP A | ZIP B | ZIP C | p-value |
|---|---|---|---|---|
| Unique Ad Domains | 145 | 132 | 151 | 0.023* |
| Ad Network Diversity | 0.73 | 0.68 | 0.71 | 0.041* |
| Avg Ads per Session | 23.4 | 22.8 | 24.1 | 0.312 |
*Statistically significant at α=0.05
ad-research-experiment/
├── src/
│ ├── agent.py # Browser automation with anti-detection
│ ├── experiment.py # Trial orchestration (concurrency control)
│ ├── analysis.py # Statistical analysis pipeline
│ ├── config.py # Configuration management
│ ├── db.py # Database layer (SQLite/Postgres)
│ ├── proxy_manager.py # Proxy rotation and identity isolation
│ └── literature.py # Literature review integration
├── tests/
│ ├── test_agent.py # Browser automation tests
│ ├── test_experiment.py # Concurrency and isolation tests
│ └── test_analysis.py # Statistical validation tests
├── scripts/
│ ├── enqueue_experiment.py # Task queue management
│ └── cleanup.py # Artifact cleanup
├── docs/ # Sphinx documentation
├── docker-compose.yml # Multi-service orchestration
├── Dockerfile # Reproducible runtime
└── README.md # This file
# Run test suite
pytest -q
# With coverage
pytest --cov=src --cov-report=html
# Run inside Docker (isolated environment)
docker run --rm ad-research-experiment:latest pytest -qThis tool is designed for research purposes only. Key principles:
- No personal data collection (only ad metadata)
- Respectful request rates (rate limiting built-in)
- Compliant with CFAA (authorized access only)
CC BY-NC 4.0 License — see LICENSE for details.
This is a personal project developed independently. Not affiliated with or endorsed by my employer.
Q: Why did you build this?
A: I wanted to understand how algorithmic systems work in practice, not just in theory. This project let me explore causal inference, distributed systems, and measurement methodology.
Q: Can I use this for my own research?
A: Yes. Please cite the repository if you use it in published work.
Q: Is this production-ready?
A: It is research-grade code. It is tested, documented, and containerized, but designed for measurement workflows rather than production ad-serving environments.
Q: What technical skills did this require?
A: Causal inference design, adversarial engineering (bot detection bypass), memory-safe concurrency, proxy management, statistical analysis, and containerized deployment.
Keywords: algorithmic auditing, causal inference, ad delivery, measurement, distributed systems, Python, Playwright