Add Continuous Materialized View to emulator by jakeheft · Pull Request #30 · bitly/little_bigtable

jakeheft · 2026-03-17T15:37:52Z

Description

This PR adds the ability to use a Continuous Materialized View (CMV) in the little bigtable emulator.
https://docs.cloud.google.com/bigtable/docs/continuous-materialized-views

CMVs are created via the standard CreateMaterializedView gRPC call, the same as production. On write, the emulator automatically creates the shadow table (if needed), transforms the row key per the SQL query, and syncs the row. Deletes propagate too.

JIRA

Resolves SQ-4524

Testing

Unit tests

go test -race ./bttest/... -v

Local end-to-end (using `--cmv-config`)

1. Build the emulator

go build -o little_bigtable .

2. Create a CMV config file

cat > /tmp/cmv_config.json << 'EOF'
[
  {
    "source_table": "events",
    "view_id": "events_by_account",
    "key_separator": "#",
    "key_mapping": [3, 4, 1, 2, 0],
    "append_source_key": true,
    "include_families": ["cf1"]
  }
]
EOF

3. Start the emulator

./little_bigtable --port 9000 --db-file /tmp/lbt.db --cmv-config /tmp/cmv_config.json
export BIGTABLE_EMULATOR_HOST=localhost:9000

4. Create the source table

cbt -project local -instance local createtable events
cbt -project local -instance local createfamily events cf1

5. Write a row to the source table

Source key format: item_id#ts#type#region#account_id

cbt -project local -instance local set events \
  "item-abc#9999999#type-x#region-a#account-42" cf1:data=test

6. Verify the row exists in the source table

cbt -project local -instance local read events

7. Verify the re-keyed row appears in the CMV shadow table

Expected key: region-a#account-42#9999999#type-x#item-abc#item-abc#9999999#type-x#region-a#account-42

cbt -project local -instance local read events_by_account

8. Verify delete propagation

# Delete the source row
cbt -project local -instance local deleterow events "item-abc#9999999#type-x#region-a#account-42"

# Confirm source row is gone
cbt -project local -instance local read events

# Confirm CMV row is also gone
cbt -project local -instance local read events_by_account

Local end-to-end (using `CreateMaterializedView` gRPC)

The same flow above can be driven programmatically using the standard Go admin client
pointed at BIGTABLE_EMULATOR_HOST. This matches the production code path and exercises
the SQL parser. See CMV_SUPPORT.md for a Go snippet.

Deploy

squash
merge

jakeheft · 2026-03-18T22:38:16Z

bttest/inmem.go

+	var deletedKeys []string // populated for prefix case; nil for deleteAll
+
 	tbl.mu.Lock()
-	defer tbl.mu.Unlock()


I removed defer tbl.mu.Unlock() in functions that call CMV propagation helpers — those helpers acquire s.mu, and holding tbl.mu into them would invert the lock order and risk deadlock. Error paths now carry their own explicit unlocks as a result.

zainkai · 2026-03-19T18:42:33Z

bttest/inmem.go

+	s.mu.Lock()
+	sourceTbl := s.tables[fqSourceTable]
+	for _, cmv := range cmvs {
+		s.ensureCMVTable(cmv, sourceTbl)


it looks like we check if the CMV table exists on all inserts and updates can that be moved into NewServer func?

It can't be moved there because the source table (and its column families) may not exist yet at registration time. The check on writes is just a map lookup once the shadow table exists — it returns immediately after the first write creates it.

zainkai · 2026-03-19T18:47:41Z

bttest/inmem.go

+	for _, action := range cmvActions {
+		if action.deleted {
+			s.deleteCMVRow(req.TableName, action.key)
+		} else {
+			s.syncCMVRow(req.TableName, action.rowCopy)
+		}
+	}


is it only possible to propagate CMV mutations if the primary table is unlocked? Ideally Id like to keep the defer tbl.mu.Unlock() line on 986 where it is

The lock ordering throughout the file is s.mu → tbl.mu. Things like ReadRows hold s.mu and then acquire tbl.mu, so holding tbl.mu while syncCMVRow tries to grab s.mu would deadlock.

We could pre-resolve the view tables under s.mu before the defer to restore it, but it adds a pre-pass on every mutation. Happy to make that change if you prefer, but the lock ordering constraint itself can't be avoided.

zainkai · 2026-03-19T19:01:15Z

bttest/cmv.go

+func ParseCMVConfigFromSQL(viewID, query string) (*CMVConfig, error) {
+	cfg := &CMVConfig{
+		ViewID:       viewID,
+		KeySeparator: "#",


should this use the key seperator in the CMVConfig

The "#" default doesn't actually matter here because ParseCMVConfigFromSQL derives the separator from the SQL itself and the default is never actually used. I'll remove it.

zainkai · 2026-03-19T19:03:08Z

bttest/cmv_test.go

+			SourceTable:     "events",
+			ViewID:          "events_by_account",
+			KeySeparator:    "#",
+			KeyMapping:      []int{3, 4, 1, 2, 0},


What happens if KeyMapping is short a key?

This is by design as key mapping doesn't need to produce a 1:1 mapping with the source table. In attributions we actually won't pull on keys from attributions_conversion_events into attributions_conversion_events_by_client.

But that's a good callout as it makes me wonder about the key mapping being too many elements. I think it will just map empty strings which we don't want. I'll go ahead and add in a guard for this

jehiah · 2026-03-21T01:47:11Z

bttest/cmv.go

+	cfg.SourceTable = fromMatch[1]
+
+	// Extract column aliases and their SAFE_OFFSET indices from SELECT.
+	offsetRe := regexp.MustCompile(`SPLIT\(_key,\s*'([^']+)'\)\[SAFE_OFFSET\((\d+)\)\].*?\bAS\s+\w+\)\s+AS\s+(\w+)|SPLIT\(_key,\s*'([^']+)'\)\[SAFE_OFFSET\((\d+)\)\]\s+AS\s+(\w+)`)


Are there other options to parse the SQL with a parser instead of regex? It feels like this is very over-fitted to key constructs?

I've been struggling to find something that can parse this specific SQL since this uses BigQuery-specific syntax SPLIT(_key, '#')[SAFE_OFFSET(n)]. I did find https://github.com/goccy/go-zetasql but that uses CGO (which is also used by sqllite package) but there are some downsides to the dependency (very large build dependency, not actively maintained and only 2 contributors, doesn't necessarily simplify parsing logic we need to write).

So while the regex isn't great, I don't think I've come across any options I like better

I also had cursor whip up a string splitting approach but it adds a good deal of complexity and I don't think it makes it any more readable. I'm happy to push up a commit if you want to take a look and compare against the regex approach

Let's align on example (test cases) of SQL strings we should be able to parse - and let that drive us.

Do you think we would benefit from splitting this parsing (ParseCMVConfigFromSQL) off into to a sql_parse.go to make it easier to review what's happening and test it specifically?

Here are the SQL patterns the parser should handle, based on the Bigtable CMV query docs :
Supported (ORDER BY secondary index with SPLIT-based re-keying):

SPLIT(_key, '#')[SAFE_OFFSET(n)] AS alias — plain key component extraction

_key AS alias in SELECT + ORDER BY — appends full source key

family AS family — column family inclusion

Custom separators (not just #)

I took another crack at a parser and got something I feel good about and pushed that up. You can see the 4 sql patterns we accept in the tests. The parser will return a clear error for any SQL it can't handle, so unsupported patterns fail loudly at CreateMaterializedView time rather than silently producing wrong results.

@jakeheft I took the liberty pushing commits to rewrite the tests to a table driven test and added the following example query from Googles integration test which should parse, but doesn't.

https://github.com/googleapis/google-cloud-go/blob/main/bigtable/integration_test.go#L4831

@jehiah Thanks for updating tests! Regarding the sql statements that won't parse, I left a comment about that in the code here:

// Only ORDER BY (secondary index) queries are supported. GROUP BY // (aggregation) queries require maintaining running aggregates and are // not implemented by the emulator.

ORDER BY CMVs are pure key re-mapping — every write produces a deterministic output row, so the emulator just transforms the key on the fly. GROUP BY aggregation CMVs require maintaining running state, meaning the emulator would need to re-aggregate across all existing rows on every write. I opted not to add that as it's a fundamentally different execution model, not just a parser extension.

I added a commit that sends an error back for any GROUP BY queries and updated the test cases accordingly.

Let me know if you disagree on this approach, but I think if we want to bring in the ability to account for GROUP BY then that blows up our scope here and that should be moved to its own ticket (and probably needs architecture/design discussions)

Ok - I think i can be convinced to add minimal CMV support for 1:1 mappings, but we should support all the supported functions that are allowed in a CMV. I am ok if we also dissallow subqueries.

bttest/cmv.go

bttest/instance_server.go

zoemccormick · 2026-03-26T17:51:58Z

bttest/inmem.go

+// syncCMVRow writes/updates the CMV shadow row for a given source row mutation.
+// Must NOT be called with s.mu held (acquires its own locks).
+func (s *server) syncCMVRow(fqSourceTable string, sourceRow *row) {
+	cmvs := s.cmvs.cmvsForTable(fqSourceTable)


these calls need to be within the s.mu.lock()/unlock to avoid concurrent map read/write. not sure it would actually happen in a dev scenario but for the case that something is writing to the table while a new cmv is getting created

Good call. Rather than wrapping in s.mu (which would deadlock since syncCMVRow acquires s.mu internally), I added a dedicated RWMutex directly on cmvRegistry so reads and writes to the config map are independently safe. Same outcome, avoids the lock ordering issue.

Let me know if you disagree with that approach

zoemccormick

a couple of small comments. also as a note, I don't see the ability to pass in --cmv-config as described in the PR comment, not sure if that was by design after iterating but it would be nice to be able to test this locally without needing to create another program to call CreateMaterializedView if possible (or, if not possible, maybe we could include a small program that we can run locally to do this). interesting learning cmvs by reviewing this!

jakeheft · 2026-03-26T21:48:38Z

I don't see the ability to pass in --cmv-config as described in the PR comment, not sure if that was by design after iterating but it would be nice to be able to test this locally without needing to create another program to call CreateMaterializedView if possible (or, if not possible, maybe we could include a small program that we can run locally to do this). interesting learning cmvs by reviewing this!

@zoemccormick yes, this was removed during development in order to match how production runs. But a good callout on testing so I've added it back in.

jehiah · 2026-03-27T01:10:34Z

But a good callout on testing so I've added it back in.

I think we should not add a CLI argument to configure CMV and we should only support grpc as we do with all table configs, etc. It's good to avoid multiple ways to do something since this is an open source project and we want to avoid space for backwards incompatible changes which is likely where we specify our own config format - we want to enforce we are matching the google APIs for configuration. (presumably this will eventually be exposed in cbt as well)

If we have tests we want to exercise we should just commit them as Go code to setup the state we want with the google APIs.

jehiah · 2026-03-27T01:14:30Z

bttest/inmem.go

+			materializedViews: make(map[string]*btapb.MaterializedView),
+			db:                db,
+			tableBackend:      NewSqlTables(db),
+			cmvs:              newCMVRegistry(),


newCMVRegistry is currently in-memory; the CMV's need to be persisted in db; little_bigtables design promise - it's whole point originally - is that it persists where the google provided emulator is in-memory.

I know this is a change because initially this implementation was framed around reading in a bespoke config instead of supporting the provisioning endpoints.

jakeheft force-pushed the add_cmv_to_emulator branch 2 times, most recently from 76994ff to 5e7c996 Compare March 18, 2026 22:20

jakeheft commented Mar 18, 2026

View reviewed changes

jakeheft force-pushed the add_cmv_to_emulator branch from ea23f26 to a30ef55 Compare March 18, 2026 23:15

zainkai reviewed Mar 19, 2026

View reviewed changes

jakeheft force-pushed the add_cmv_to_emulator branch 2 times, most recently from 1f1064c to 87dd05b Compare March 19, 2026 21:09

jehiah reviewed Mar 21, 2026

View reviewed changes

jakeheft force-pushed the add_cmv_to_emulator branch from 0eb2613 to 02e3325 Compare March 24, 2026 20:12

zoemccormick reviewed Mar 26, 2026

View reviewed changes

bttest/instance_server.go Show resolved Hide resolved

zoemccormick reviewed Mar 26, 2026

View reviewed changes

jakeheft and others added 8 commits March 26, 2026 21:33

add CMV to little bigtable emulator

2ded797

add sql parser

21bd8ec

add deletion protection functionality

61f4d7a

TestParseCMVConfigFromSQL: table driven test

9c8d704

google-cloud-go integration test

19ed778

expand test cases

c18b800

error on group by

45e0b61

address comments

e7e837c

jakeheft force-pushed the add_cmv_to_emulator branch from 68e9b2b to e7e837c Compare March 26, 2026 21:34

jehiah reviewed Mar 27, 2026

View reviewed changes

Conversation

jakeheft commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

JIRA

Testing

Testing

Unit tests

Local end-to-end (using --cmv-config)

Local end-to-end (using CreateMaterializedView gRPC)

Deploy

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zoemccormick left a comment

Choose a reason for hiding this comment

Uh oh!

jakeheft commented Mar 26, 2026

Uh oh!

jehiah commented Mar 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

jakeheft commented Mar 17, 2026 •

edited

Loading

Local end-to-end (using `--cmv-config`)

Local end-to-end (using `CreateMaterializedView` gRPC)