Skip to content

jjviscomi/bqemulator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

102 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

bqemulator

A local, drop-in emulator for Google BigQuery.

DuckDB-backed, SQLGlot-powered, and tested against the real service. Point the official Google Cloud client libraries at it and run your BigQuery code on your laptop or in CI — no real project, no billing, no network.

CI E2E Conformance Docs PyPI Python OpenSSF Scorecard License Ruff Checked with mypy

Documentation · Quickstart · Examples · Compatibility matrix · Changelog


Why bqemulator?

Testing code against real BigQuery is slow (network + service latency), expensive (every query is billable), and dangerous (no rollback in shared environments). The alternatives — mocks, fakes, and shared sandboxes — drift from the real service the moment you stop chasing them.

bqemulator is a process you can run locally that speaks BigQuery's actual wire protocol (REST + gRPC), backs onto a real analytical SQL engine (DuckDB), and translates GoogleSQL → DuckDB SQL with a rule-based, ADR-grounded translator (SQLGlot). The official google-cloud-bigquery, @google-cloud/bigquery, cloud.google.com/go/bigquery, com.google.cloud:google-cloud-bigquery, and bq CLI clients all work against it unchanged — only the endpoint differs.

Three use cases, one binary:

  • Ephemeral CI fixturepytest plugin starts an in-process emulator on a random port; pip install bqemulator[testing] is all the wiring you need.
  • Long-running local dev serverbqemulator start --data-dir ~/bqemu persists state across runs; works with the official bq CLI, dbt, Airflow, PySpark, Beam, Scio.
  • Offline replica of a real projectbqemulator import --from-project <id> clones schema (and optionally data) from real BigQuery into a local data directory.

Highlights

  • 🟢 Full REST + gRPC API parity — Datasets, Tables, Jobs, TableData, Routines, Row Access Policies, Authorized Views, plus Models CRUD metadata. Storage Read API (Arrow and Avro). Storage Write API (all four stream types — DEFAULT, COMMITTED, PENDING, BUFFERED — with both proto and Arrow row formats).
  • Real SQL — GoogleSQL translated to DuckDB SQL via 93 SQLGlot rules + 24 rewriters; covers date/time, string, array, struct, range, geography, JSON, approximate-aggregate, statistical, regex, civil-time, and bit operations.
  • 🧠 Features goccy/bigquery-emulator doesn't have — JavaScript UDFs (embedded V8 via mini-racer), procedural scripting (DECLARE / BEGIN…END / IF / LOOP / EXCEPTION / BEGIN TRANSACTION), time travel (FOR SYSTEM_TIME AS OF), table snapshots, table clones, materialized views with refresh dispatch, GEOGRAPHY (planar via DuckDB-spatial + S2 helpers), RANGE, INTERVAL, authorized views, row-access policies, INFORMATION_SCHEMA.
  • 🔌 Five-client e2e matrix — every release is exercised against the official Python, Node.js, Go, and Java BigQuery client libraries plus Google's bq CLI in a live Docker container.
  • 🧪 7-tier test pyramid — unit + property + integration + conformance + e2e + perf + chaos, plus mutation / fuzz / differential siblings. Combined coverage is gated at ≥90% line + branch.
  • 📐 Conformance corpus — 1,244 fixtures recorded against real BigQuery. Drift between the emulator and the real service surfaces as a failing test; documented divergences are pinned with ADR references.
  • 🐍 Native pytest pluginpip install bqemulator registers a pytest plugin; the bqemu_server fixture starts an ephemeral in-process emulator on random free ports and sets BIGQUERY_EMULATOR_HOST. No conftest.py wiring required.
  • 🐳 Multi-arch containerghcr.io/jjviscomi/bqemulator builds for linux/amd64 + linux/arm64, with cosign keyless signatures via GitHub OIDC.
  • 🔭 Production-grade observabilitystructlog JSON logs, OpenTelemetry tracing (configurable OTLP exporter), Prometheus metrics endpoint.

Install

pip install bqemulator

Optional extras:

pip install "bqemulator[testing]"      # pytest, hypothesis, testcontainers, bigquery client
pip install "bqemulator[udf-js]"       # JavaScript UDF support (embedded V8)
pip install "bqemulator[orc]"          # ORC format for load jobs
pip install "bqemulator[compression]"  # zstd + snappy for load/extract jobs
pip install "bqemulator[import]"       # bqemulator import --from-project
pip install "bqemulator[all]"          # all runtime extras (no testing extras)

Docker:

docker run --rm -p 9050:9050 -p 9060:9060 ghcr.io/jjviscomi/bqemulator:latest

Both pip and the published image bundle the same emulator. The image exposes REST on 9050 and gRPC on 9060 by default — see configuration reference to change them.

Windows users: install Docker Desktop for Windows with the WSL2 backend (default since Docker Desktop 4.x); the published Linux image runs natively under WSL2 with no Windows-specific configuration. Native Windows-container variants of the image are explicitly out of scope for v1.0 — see docs/reference/out-of-scope.md#native-windows-containers for the rationale.

Quickstart

Python

import os
from google.cloud import bigquery

# Either set BIGQUERY_EMULATOR_HOST (picked up by every Google Cloud library)
# or pass api_endpoint explicitly to the Client. Both work.
os.environ["BIGQUERY_EMULATOR_HOST"] = "localhost:9050"

client = bigquery.Client(project="my-test-project")

client.create_dataset("sales")
client.create_table(
    bigquery.Table(
        "sales.orders",
        schema=[
            bigquery.SchemaField("id", "INT64"),
            bigquery.SchemaField("amount", "NUMERIC"),
            bigquery.SchemaField("placed_at", "TIMESTAMP"),
        ],
    )
)
client.insert_rows_json(
    "sales.orders",
    [{"id": 1, "amount": "12.50", "placed_at": "2026-05-21T00:00:00Z"}],
)

for row in client.query("SELECT COUNT(*) AS n FROM sales.orders").result():
    print(row.n)  # 1

pytest

bqemulator ships a pytest plugin via the pytest11 entry point. Installing the package is all the wiring you need — your conftest.py stays empty.

from google.cloud import bigquery

def test_orders_table(bqemu_client: bigquery.Client) -> None:
    bqemu_client.create_dataset("sales")
    # ... your test ...

The bqemu_server fixture is session-scoped (one emulator per test session); the bqemu_client fixture is function-scoped and returns a pre-configured bigquery.Client. See the pytest fixture guide and the python/pytest-integration example for a complete Flask app with integration tests.

Node.js

const { BigQuery } = require('@google-cloud/bigquery');

const bq = new BigQuery({
  projectId: 'my-test-project',
  apiEndpoint: 'http://localhost:9050',
  token: 'dummy',  // emulator accepts any token
});

await bq.createDataset('sales');

See the Node.js quickstart and the nodejs/nestjs-app example.

Go

client, _ := bigquery.NewClient(
    ctx, "my-test-project",
    option.WithEndpoint("http://localhost:9050"),
    option.WithoutAuthentication(),
)

See the Go quickstart and the go/beam-pipeline example.

Java

BigQuery bq = BigQueryOptions.newBuilder()
    .setProjectId("my-test-project")
    .setHost("http://localhost:9050")
    .setCredentials(NoCredentials.getInstance())
    .build()
    .getService();

See the Java quickstart and the java/spring-boot example.

bq CLI

bq --api=http://localhost:9050 \
   --project_id=my-test-project \
   query --use_legacy_sql=false 'SELECT 1 AS n'

See the bq CLI guide and the bq-cli-quickstart example.

docker-compose

services:
  bqemulator:
    image: ghcr.io/jjviscomi/bqemulator:latest
    ports: ["9050:9050", "9060:9060"]
    healthcheck:
      test: ["CMD", "curl", "-sf", "http://localhost:9050/healthz"]
      interval: 2s
      retries: 30

  app:
    build: .
    environment:
      BIGQUERY_EMULATOR_HOST: bqemulator:9050
    depends_on:
      bqemulator: { condition: service_healthy }

See the docker-compose/full-stack example for app + emulator + Prometheus + Grafana.

What works today

bqemulator is at v1.1.2 — first minor on the production-stable line. SemVer applies: breaking changes ship only in MAJOR, deprecations live ≥2 MINOR or 6 months. The compatibility matrix is auto-generated from the conformance corpus on every CI run; the conformance coverage matrix breaks down support by surface item.

Surface Status
BigQuery REST: Datasets / Tables / Jobs / TableData / Routines / Row Access Policies / Authorized Views
Multipart + resumable upload (/upload/bigquery/v2/...)
INFORMATION_SCHEMA (TABLES, COLUMNS, ROUTINES, VIEWS, JOBS, JOBS_BY_*, MATERIALIZED_VIEWS, PARTITIONS, TABLE_OPTIONS, …)
Storage Read API (Arrow + Avro)
Storage Write API (all 4 stream types, proto + Arrow row formats)
GoogleSQL function surface (date / time / string / array / struct / JSON / regex / aggregate / approx / civil-time / bit)
Procedural scripting (DECLARE, BEGIN…END, IF, LOOP, EXCEPTION, BEGIN TRANSACTION)
SQL / JavaScript / Table-valued UDFs
Time travel (FOR SYSTEM_TIME AS OF), snapshots, clones, materialized views
Authorized views + row access policies + caller identity
GEOGRAPHY / RANGE / INTERVAL / NUMERIC / BIGNUMERIC types
Load formats: CSV / JSON / Avro / ORC / Parquet
Extract formats: CSV / JSON / Avro / Parquet
BigQuery ML (CREATE MODEL, ML.PREDICT, …) ❌ Out of scope — see docs/reference/out-of-scope.md
BI Engine / slot reservations / Data Transfer Service / scheduled queries ❌ Out of scope

Conformance corpus depth (the conformance coverage matrix carries the live, auto-generated breakdown):

Status Surface items % of deterministic surface
🟢🟢 Deep (≥6 fixtures) 98 24.3%
🟢 Covered (3–5 fixtures) 69 17.1%
🟡 Sampled (1–2 fixtures) 235 58.3%
🔴 Uncovered (0 fixtures) 1 0.2%
Total 403 100.0%

Plus 10 non-deterministic items (RAND, CURRENT_*, SESSION_USER, GENERATE_UUID, TABLESAMPLE, FOR SYSTEM_TIME AS OF <expression>) that are excluded from the conformance corpus by ADR 0022 and exercised in unit / property / integration tiers instead — bringing the full inventory to 413 surface items across 20 categories, backed by a 1,244-fixture conformance corpus (1,170 SQL + 48 HTTP + 26 gRPC) under tests/conformance/.

We follow a no-deferral principle: features either ship complete or are excluded with documented rationale. There is no "TODO for v1.1." Scope boundaries are catalogued in docs/reference/out-of-scope.md.

Documentation

The full documentation lives at jjviscomi.github.io/bqemulator. Key entry points:

  • Getting started — your first ten minutes.
  • Per-language quickstarts — Python · Node.js · Go · Java · pytest · docker-compose · Testcontainers.
  • Guides — loading data, querying, streaming inserts, Storage API, UDFs, scripting, partitioning, time travel, materialized views, row access policies, dbt, Airflow, Spark, the bq CLI, observability, and more.
  • Reference — configuration, CLI, REST coverage, SQL function mapping, compatibility matrix, conformance coverage matrix, out-of-scope catalogue, troubleshooting.
  • Architecture — hexagonal architecture, storage model, SQL translation, jobs lifecycle, Storage Read/Write API design, scripting, UDFs, versioning, row access, specialized types, observability, testing strategy, conformance tier.
  • ADRs — 42 Architecture Decision Records documenting every non-obvious design choice.

Examples

Every example under docs/examples/ is a complete, runnable project with its own make test validated by CI:

Toolchain Example What it demonstrates
Python python/pytest-integration Flask app + auto-discovered bqemu_client fixture
Python python/dbt-local dbt build cycle via endpoint override
Python python/airflow-dag-test BigQueryInsertJobOperator DAG via offline dag.test()
Python python/pyspark-bigquery Storage Read → Arrow → Spark DataFrame
Node.js nodejs/nestjs-app NestJS + Jest + supertest e2e
Node.js nodejs/cloud-run-local Cloud Run-shaped Express + docker-compose
Go go/beam-pipeline Apache Beam Go SDK + Testcontainers
Go go/dataflow-local Stand-alone Go ETL binary
Java java/spring-boot Spring Boot + Testcontainers
Scala java/scio Spotify Scio (Scala-on-Beam) pipeline
Compose docker-compose/full-stack App + emulator + Prometheus + Grafana
CI ci-recipes/github-actions Service-container + Testcontainers patterns
CI ci-recipes/gitlab-ci services: alias on the CI network
CI ci-recipes/circleci Docker-secondary + machine executor

Project status

bqemulator is at v1.1.2 — first minor on the production-stable line. SemVer applies: breaking changes ship only in MAJOR versions, preceded by ≥1 MINOR with deprecation warnings; deprecated APIs remain for ≥2 MINOR versions or 6 months.

Maturity signals:

  • ✅ 42 Architecture Decision Records covering every non-obvious design choice (docs/adr/00010042).
  • ✅ ≥90% line + branch coverage gated by CI (make verify).
  • ✅ 7 test tiers passing (unit + property + integration + conformance + e2e + perf + chaos).
  • ✅ 5-client e2e matrix (Python · Node.js · Go · Java · bq CLI).
  • ✅ Mutation-tier (mutmut) pilot landed on pure-domain modules.
  • ✅ Fuzz-tier (Atheris) harnesses on the SQL translator, dynamic-protobuf decoder, and Arrow bridge.
  • ✅ Differential-tier row-order perturbation of the entire conformance corpus passes.
  • ✅ Performance baselines committed for darwin-arm64, with regression gates (pytest-benchmark --benchmark-compare-fail=median:10%).
  • ✅ PyPI publish via Trusted Publishing (sigstore-attested wheels) — pip install bqemulator==1.1.2 resolves from PyPI.
  • ✅ GHCR publish with keyless cosign signatures — docker pull ghcr.io/jjviscomi/bqemulator:1.1.2 resolves and the image is cosign-verifiable.

See CHANGELOG.md for the complete release-by-release inventory.

Contributing

We welcome contributions of all sizes. Start with CONTRIBUTING.md for the mechanics; AGENTS.md captures the project's day-to-day conventions; and docs/architecture/overview.md is the canonical architectural reference.

Pull requests are squash-merged into main with a Conventional Commits subject; commits carry a DCO sign-off (git commit -s). The full review policy lives in GOVERNANCE.md.

Community

  • 💬 GitHub Discussions — design questions, usage questions, and general help.
  • 🐛 Issues — bug reports and feature requests. Please search existing issues first.
  • 🔒 Security advisories — report vulnerabilities privately via the GitHub Security Advisory flow (see SECURITY.md for our disclosure policy).
  • 📜 Code of Conduct — adapted from the Contributor Covenant 2.1.

License

bqemulator is released under the Apache License 2.0.

Acknowledgements

About

Local emulator for Google BigQuery. DuckDB-backed, SQLGlot-powered. Drop-in replacement for the real service in dev, CI, and offline replicas.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors