A local, drop-in emulator for Google BigQuery.
DuckDB-backed, SQLGlot-powered, and tested against the real service. Point the official Google Cloud client libraries at it and run your BigQuery code on your laptop or in CI — no real project, no billing, no network.
Documentation · Quickstart · Examples · Compatibility matrix · Changelog
Testing code against real BigQuery is slow (network + service latency), expensive (every query is billable), and dangerous (no rollback in shared environments). The alternatives — mocks, fakes, and shared sandboxes — drift from the real service the moment you stop chasing them.
bqemulator is a process you can run locally that speaks BigQuery's actual wire protocol (REST + gRPC), backs onto a real analytical SQL engine (DuckDB), and translates GoogleSQL → DuckDB SQL with a rule-based, ADR-grounded translator (SQLGlot). The official google-cloud-bigquery, @google-cloud/bigquery, cloud.google.com/go/bigquery, com.google.cloud:google-cloud-bigquery, and bq CLI clients all work against it unchanged — only the endpoint differs.
Three use cases, one binary:
- Ephemeral CI fixture —
pytestplugin starts an in-process emulator on a random port;pip install bqemulator[testing]is all the wiring you need. - Long-running local dev server —
bqemulator start --data-dir ~/bqemupersists state across runs; works with the officialbqCLI, dbt, Airflow, PySpark, Beam, Scio. - Offline replica of a real project —
bqemulator import --from-project <id>clones schema (and optionally data) from real BigQuery into a local data directory.
- 🟢 Full REST + gRPC API parity — Datasets, Tables, Jobs, TableData, Routines, Row Access Policies, Authorized Views, plus Models CRUD metadata. Storage Read API (Arrow and Avro). Storage Write API (all four stream types —
DEFAULT,COMMITTED,PENDING,BUFFERED— with both proto and Arrow row formats). - ⚡ Real SQL — GoogleSQL translated to DuckDB SQL via 93 SQLGlot rules + 24 rewriters; covers date/time, string, array, struct, range, geography, JSON, approximate-aggregate, statistical, regex, civil-time, and bit operations.
- 🧠 Features
goccy/bigquery-emulatordoesn't have — JavaScript UDFs (embedded V8 viamini-racer), procedural scripting (DECLARE/BEGIN…END/IF/LOOP/EXCEPTION/BEGIN TRANSACTION), time travel (FOR SYSTEM_TIME AS OF), table snapshots, table clones, materialized views with refresh dispatch, GEOGRAPHY (planar via DuckDB-spatial + S2 helpers), RANGE, INTERVAL, authorized views, row-access policies,INFORMATION_SCHEMA. - 🔌 Five-client e2e matrix — every release is exercised against the official Python, Node.js, Go, and Java BigQuery client libraries plus Google's
bqCLI in a live Docker container. - 🧪 7-tier test pyramid — unit + property + integration + conformance + e2e + perf + chaos, plus mutation / fuzz / differential siblings. Combined coverage is gated at ≥90% line + branch.
- 📐 Conformance corpus — 1,244 fixtures recorded against real BigQuery. Drift between the emulator and the real service surfaces as a failing test; documented divergences are pinned with ADR references.
- 🐍 Native pytest plugin —
pip install bqemulatorregisters a pytest plugin; thebqemu_serverfixture starts an ephemeral in-process emulator on random free ports and setsBIGQUERY_EMULATOR_HOST. Noconftest.pywiring required. - 🐳 Multi-arch container —
ghcr.io/jjviscomi/bqemulatorbuilds forlinux/amd64+linux/arm64, with cosign keyless signatures via GitHub OIDC. - 🔭 Production-grade observability —
structlogJSON logs, OpenTelemetry tracing (configurable OTLP exporter), Prometheus metrics endpoint.
pip install bqemulatorOptional extras:
pip install "bqemulator[testing]" # pytest, hypothesis, testcontainers, bigquery client
pip install "bqemulator[udf-js]" # JavaScript UDF support (embedded V8)
pip install "bqemulator[orc]" # ORC format for load jobs
pip install "bqemulator[compression]" # zstd + snappy for load/extract jobs
pip install "bqemulator[import]" # bqemulator import --from-project
pip install "bqemulator[all]" # all runtime extras (no testing extras)Docker:
docker run --rm -p 9050:9050 -p 9060:9060 ghcr.io/jjviscomi/bqemulator:latestBoth pip and the published image bundle the same emulator. The image exposes REST on 9050 and gRPC on 9060 by default — see configuration reference to change them.
Windows users: install Docker Desktop for Windows with the WSL2 backend (default since Docker Desktop 4.x); the published Linux image runs natively under WSL2 with no Windows-specific configuration. Native Windows-container variants of the image are explicitly out of scope for v1.0 — see docs/reference/out-of-scope.md#native-windows-containers for the rationale.
import os
from google.cloud import bigquery
# Either set BIGQUERY_EMULATOR_HOST (picked up by every Google Cloud library)
# or pass api_endpoint explicitly to the Client. Both work.
os.environ["BIGQUERY_EMULATOR_HOST"] = "localhost:9050"
client = bigquery.Client(project="my-test-project")
client.create_dataset("sales")
client.create_table(
bigquery.Table(
"sales.orders",
schema=[
bigquery.SchemaField("id", "INT64"),
bigquery.SchemaField("amount", "NUMERIC"),
bigquery.SchemaField("placed_at", "TIMESTAMP"),
],
)
)
client.insert_rows_json(
"sales.orders",
[{"id": 1, "amount": "12.50", "placed_at": "2026-05-21T00:00:00Z"}],
)
for row in client.query("SELECT COUNT(*) AS n FROM sales.orders").result():
print(row.n) # 1bqemulator ships a pytest plugin via the pytest11 entry point. Installing the package is all the wiring you need — your conftest.py stays empty.
from google.cloud import bigquery
def test_orders_table(bqemu_client: bigquery.Client) -> None:
bqemu_client.create_dataset("sales")
# ... your test ...The bqemu_server fixture is session-scoped (one emulator per test session); the bqemu_client fixture is function-scoped and returns a pre-configured bigquery.Client. See the pytest fixture guide and the python/pytest-integration example for a complete Flask app with integration tests.
const { BigQuery } = require('@google-cloud/bigquery');
const bq = new BigQuery({
projectId: 'my-test-project',
apiEndpoint: 'http://localhost:9050',
token: 'dummy', // emulator accepts any token
});
await bq.createDataset('sales');See the Node.js quickstart and the nodejs/nestjs-app example.
client, _ := bigquery.NewClient(
ctx, "my-test-project",
option.WithEndpoint("http://localhost:9050"),
option.WithoutAuthentication(),
)See the Go quickstart and the go/beam-pipeline example.
BigQuery bq = BigQueryOptions.newBuilder()
.setProjectId("my-test-project")
.setHost("http://localhost:9050")
.setCredentials(NoCredentials.getInstance())
.build()
.getService();See the Java quickstart and the java/spring-boot example.
bq --api=http://localhost:9050 \
--project_id=my-test-project \
query --use_legacy_sql=false 'SELECT 1 AS n'See the bq CLI guide and the bq-cli-quickstart example.
services:
bqemulator:
image: ghcr.io/jjviscomi/bqemulator:latest
ports: ["9050:9050", "9060:9060"]
healthcheck:
test: ["CMD", "curl", "-sf", "http://localhost:9050/healthz"]
interval: 2s
retries: 30
app:
build: .
environment:
BIGQUERY_EMULATOR_HOST: bqemulator:9050
depends_on:
bqemulator: { condition: service_healthy }See the docker-compose/full-stack example for app + emulator + Prometheus + Grafana.
bqemulator is at v1.1.2 — first minor on the production-stable
line. SemVer applies: breaking changes ship only in MAJOR,
deprecations live ≥2 MINOR or 6 months. The compatibility matrix is auto-generated from the conformance corpus on every CI run; the conformance coverage matrix breaks down support by surface item.
| Surface | Status |
|---|---|
| BigQuery REST: Datasets / Tables / Jobs / TableData / Routines / Row Access Policies / Authorized Views | ✅ |
Multipart + resumable upload (/upload/bigquery/v2/...) |
✅ |
INFORMATION_SCHEMA (TABLES, COLUMNS, ROUTINES, VIEWS, JOBS, JOBS_BY_*, MATERIALIZED_VIEWS, PARTITIONS, TABLE_OPTIONS, …) |
✅ |
| Storage Read API (Arrow + Avro) | ✅ |
| Storage Write API (all 4 stream types, proto + Arrow row formats) | ✅ |
| GoogleSQL function surface (date / time / string / array / struct / JSON / regex / aggregate / approx / civil-time / bit) | ✅ |
Procedural scripting (DECLARE, BEGIN…END, IF, LOOP, EXCEPTION, BEGIN TRANSACTION) |
✅ |
| SQL / JavaScript / Table-valued UDFs | ✅ |
Time travel (FOR SYSTEM_TIME AS OF), snapshots, clones, materialized views |
✅ |
| Authorized views + row access policies + caller identity | ✅ |
| GEOGRAPHY / RANGE / INTERVAL / NUMERIC / BIGNUMERIC types | ✅ |
| Load formats: CSV / JSON / Avro / ORC / Parquet | ✅ |
| Extract formats: CSV / JSON / Avro / Parquet | ✅ |
BigQuery ML (CREATE MODEL, ML.PREDICT, …) |
❌ Out of scope — see docs/reference/out-of-scope.md |
| BI Engine / slot reservations / Data Transfer Service / scheduled queries | ❌ Out of scope |
Conformance corpus depth (the conformance coverage matrix carries the live, auto-generated breakdown):
| Status | Surface items | % of deterministic surface |
|---|---|---|
| 🟢🟢 Deep (≥6 fixtures) | 98 | 24.3% |
| 🟢 Covered (3–5 fixtures) | 69 | 17.1% |
| 🟡 Sampled (1–2 fixtures) | 235 | 58.3% |
| 🔴 Uncovered (0 fixtures) | 1 | 0.2% |
| Total | 403 | 100.0% |
Plus 10 non-deterministic items (RAND, CURRENT_*, SESSION_USER, GENERATE_UUID, TABLESAMPLE, FOR SYSTEM_TIME AS OF <expression>) that are excluded from the conformance corpus by ADR 0022 and exercised in unit / property / integration tiers instead — bringing the full inventory to 413 surface items across 20 categories, backed by a 1,244-fixture conformance corpus (1,170 SQL + 48 HTTP + 26 gRPC) under tests/conformance/.
We follow a no-deferral principle: features either ship complete or are excluded with documented rationale. There is no "TODO for v1.1." Scope boundaries are catalogued in docs/reference/out-of-scope.md.
The full documentation lives at jjviscomi.github.io/bqemulator. Key entry points:
- Getting started — your first ten minutes.
- Per-language quickstarts — Python · Node.js · Go · Java · pytest · docker-compose · Testcontainers.
- Guides — loading data, querying, streaming inserts, Storage API, UDFs, scripting, partitioning, time travel, materialized views, row access policies, dbt, Airflow, Spark, the
bqCLI, observability, and more. - Reference — configuration, CLI, REST coverage, SQL function mapping, compatibility matrix, conformance coverage matrix, out-of-scope catalogue, troubleshooting.
- Architecture — hexagonal architecture, storage model, SQL translation, jobs lifecycle, Storage Read/Write API design, scripting, UDFs, versioning, row access, specialized types, observability, testing strategy, conformance tier.
- ADRs — 42 Architecture Decision Records documenting every non-obvious design choice.
Every example under docs/examples/ is a complete, runnable project with its own make test validated by CI:
| Toolchain | Example | What it demonstrates |
|---|---|---|
| Python | python/pytest-integration |
Flask app + auto-discovered bqemu_client fixture |
| Python | python/dbt-local |
dbt build cycle via endpoint override |
| Python | python/airflow-dag-test |
BigQueryInsertJobOperator DAG via offline dag.test() |
| Python | python/pyspark-bigquery |
Storage Read → Arrow → Spark DataFrame |
| Node.js | nodejs/nestjs-app |
NestJS + Jest + supertest e2e |
| Node.js | nodejs/cloud-run-local |
Cloud Run-shaped Express + docker-compose |
| Go | go/beam-pipeline |
Apache Beam Go SDK + Testcontainers |
| Go | go/dataflow-local |
Stand-alone Go ETL binary |
| Java | java/spring-boot |
Spring Boot + Testcontainers |
| Scala | java/scio |
Spotify Scio (Scala-on-Beam) pipeline |
| Compose | docker-compose/full-stack |
App + emulator + Prometheus + Grafana |
| CI | ci-recipes/github-actions |
Service-container + Testcontainers patterns |
| CI | ci-recipes/gitlab-ci |
services: alias on the CI network |
| CI | ci-recipes/circleci |
Docker-secondary + machine executor |
bqemulator is at v1.1.2 — first minor on the production-stable
line. SemVer applies: breaking changes ship only in MAJOR
versions, preceded by ≥1 MINOR with deprecation warnings;
deprecated APIs remain for ≥2 MINOR versions or 6 months.
Maturity signals:
- ✅ 42 Architecture Decision Records covering every non-obvious design choice (
docs/adr/0001–0042). - ✅ ≥90% line + branch coverage gated by CI (
make verify). - ✅ 7 test tiers passing (unit + property + integration + conformance + e2e + perf + chaos).
- ✅ 5-client e2e matrix (Python · Node.js · Go · Java ·
bqCLI). - ✅ Mutation-tier (
mutmut) pilot landed on pure-domain modules. - ✅ Fuzz-tier (
Atheris) harnesses on the SQL translator, dynamic-protobuf decoder, and Arrow bridge. - ✅ Differential-tier row-order perturbation of the entire conformance corpus passes.
- ✅ Performance baselines committed for
darwin-arm64, with regression gates (pytest-benchmark--benchmark-compare-fail=median:10%). - ✅ PyPI publish via Trusted Publishing (sigstore-attested wheels) —
pip install bqemulator==1.1.2resolves from PyPI. - ✅ GHCR publish with keyless cosign signatures —
docker pull ghcr.io/jjviscomi/bqemulator:1.1.2resolves and the image is cosign-verifiable.
See CHANGELOG.md for the complete release-by-release inventory.
We welcome contributions of all sizes. Start with CONTRIBUTING.md for the mechanics; AGENTS.md captures the project's day-to-day conventions; and docs/architecture/overview.md is the canonical architectural reference.
Pull requests are squash-merged into main with a Conventional Commits subject; commits carry a DCO sign-off (git commit -s). The full review policy lives in GOVERNANCE.md.
- 💬 GitHub Discussions — design questions, usage questions, and general help.
- 🐛 Issues — bug reports and feature requests. Please search existing issues first.
- 🔒 Security advisories — report vulnerabilities privately via the GitHub Security Advisory flow (see SECURITY.md for our disclosure policy).
- 📜 Code of Conduct — adapted from the Contributor Covenant 2.1.
bqemulator is released under the Apache License 2.0.
goccy/bigquery-emulatorfor blazing the trail and providing a decade of issue reports that seeded our regression corpus.- DuckDB, SQLGlot, FastAPI, Pydantic, Hatchling, and the Google Cloud client library teams whose work makes this project tractable.
- The Apache Beam, dbt, Airflow, PySpark, Spotify Scio, NestJS, and Spring Boot communities whose work the example projects compose with.