feat(transform): add merge_subwatcher_fields — general subwatcher field enrichment#138
Conversation
… enrichment Adds a new backend transform `merge_subwatcher_fields(base, sub, keys)` that enriches base window events with extra fields from a subwatcher (editor, browser) by finding the longest-overlapping subwatcher event per base event and copying the named keys into the base event's data dict. Unlike the concat workaround used in aw-webui#851, this approach: - Preserves timestamps, durations, and event count exactly (no phantom events) - Keeps app/title/duration aggregations correct by construction - Lives in the backend so every client (webui, native UIs, exporters) benefits - Handles both editor (project/file/language) and browser (url/$domain) fields through one mechanism, closing both #1305 and #352 Exposes as q2_merge_subwatcher_fields in the query2 function registry. Adds 6 tests covering: basic enrichment, no-overlap passthrough, base_wins and sub_wins conflict modes, attach-longest N:1 overlap selection, and empty inputs. Closes ActivityWatch/activitywatch#1305, ActivityWatch/activitywatch#352
Greptile SummaryThis PR adds
Confidence Score: 5/5Safe to merge; the transform is additive and does not touch timestamps, durations, or event counts. The core algorithm is correct: the break/continue boundary conditions are consistent with the rest of the codebase, the conflict-resolution guard surfaces invalid values immediately, and every result event is a deep copy of the original base. The two findings are both defensive-programming suggestions — one around deep-copying mutable subwatcher values (harmless for the typical string data AW stores today), and one around algorithm scalability for very large datasets. Neither introduces incorrect behaviour for the described use cases. aw_transform/merge_subwatcher_fields.py — the value-copy and inner-scan patterns are worth a second look before this is used on large date ranges. Important Files Changed
Sequence DiagramsequenceDiagram
participant Q as Query Pipeline
participant FPI as filter_period_intersect
participant MSF as merge_subwatcher_fields
participant CAT as categorize
Q->>FPI: subwatcher_events, base_events
FPI-->>Q: clipped subwatcher_events
Q->>MSF: base_events, clipped subwatcher_events, keys
note over MSF: For each base event:<br/>find longest-overlapping sub event<br/>(attach-longest strategy)
MSF-->>Q: enriched base_events (timestamps/durations unchanged)
Q->>CAT: enriched base_events, classes
CAT-->>Q: categorized events with $category
Reviews (2): Last reviewed commit: "fix(transform): address Greptile P1/P2 f..." | Re-trigger Greptile |
| if not subwatcher_events or not keys: | ||
| return base_events |
There was a problem hiding this comment.
The early-return path returns the original
base_events list by reference rather than a defensive copy. Every other code path through this function returns a list of deepcopy-ed events, so callers reasonably expect the returned value to be independent of the input. With the early return, any in-place mutation of the result silently corrupts the original list, which is especially dangerous in a query pipeline where the same events variable is reused.
| if not subwatcher_events or not keys: | |
| return base_events | |
| if not subwatcher_events or not keys: | |
| return [deepcopy(e) for e in base_events] |
There was a problem hiding this comment.
Fixed in e78081b: early return now returns [deepcopy(e) for e in base_events] instead of the original list reference, matching the contract documented in the docstring ('Returns: A new list of base events'). Test updated to assert result is not base.
| if not subwatcher_events or not keys: | ||
| return base_events |
There was a problem hiding this comment.
The
conflict parameter is compared only against "base_wins", so any other string — including typos like "sub_win" — silently behaves as "sub_wins". An explicit validation guard surfaces mistakes early.
| if not subwatcher_events or not keys: | |
| return base_events | |
| if conflict not in ("base_wins", "sub_wins"): | |
| raise ValueError(f"conflict must be 'base_wins' or 'sub_wins', got {conflict!r}") | |
| if not subwatcher_events or not keys: | |
| return base_events |
There was a problem hiding this comment.
Fixed in e78081b: added if conflict not in ("base_wins", "sub_wins"): raise ValueError(...) guard before the early-return check, so invalid values (including typos like 'sub_win') surface immediately. Test test_merge_subwatcher_fields_invalid_conflict covers this.
| @q2_function(merge_subwatcher_fields) | ||
| @q2_typecheck | ||
| def q2_merge_subwatcher_fields( | ||
| base_events: list, subwatcher_events: list, keys: list | ||
| ) -> List[Event]: | ||
| return merge_subwatcher_fields(base_events, subwatcher_events, keys) |
There was a problem hiding this comment.
The
conflict parameter from merge_subwatcher_fields is not forwarded through the query2 wrapper, so query2 callers are permanently limited to "base_wins" with no way to opt into "sub_wins". Since q2_typecheck skips optional parameters, adding conflict: str = "base_wins" and forwarding it is a safe, non-breaking change.
| @q2_function(merge_subwatcher_fields) | |
| @q2_typecheck | |
| def q2_merge_subwatcher_fields( | |
| base_events: list, subwatcher_events: list, keys: list | |
| ) -> List[Event]: | |
| return merge_subwatcher_fields(base_events, subwatcher_events, keys) | |
| @q2_function(merge_subwatcher_fields) | |
| @q2_typecheck | |
| def q2_merge_subwatcher_fields( | |
| base_events: list, subwatcher_events: list, keys: list, conflict: str = "base_wins" | |
| ) -> List[Event]: | |
| return merge_subwatcher_fields(base_events, subwatcher_events, keys, conflict=conflict) |
There was a problem hiding this comment.
Fixed in e78081b: q2_merge_subwatcher_fields now accepts conflict: str = "base_wins" and forwards it to merge_subwatcher_fields. Query2 callers can now opt into sub_wins via the standard QL2 function call syntax.
…ields - P1: early return now returns deepcopy list instead of original reference, matching the contract 'Returns: A new list of base events' - P2a: validate conflict param before early return; invalid values raise ValueError rather than silently behaving as 'sub_wins' - P2b: forward conflict param through q2_merge_subwatcher_fields wrapper so query2 callers can opt into 'sub_wins' instead of being locked to default - test: extend empty-inputs test to assert result is not the same list object - test: add test_merge_subwatcher_fields_invalid_conflict for P2a validation
|
@greptileai review |
|
Addressed all Greptile findings in e78081b:
CI running. Greptile re-review triggered. |
|
CI is green on all platforms (ubuntu, macOS, Windows, lint) and Greptile re-review came back 5/5 after the P1/P2 fixes in e78081b. This PR is ready for maintainer review and merge. Happy to address any additional feedback. |
|
Ready to merge — CI green, Greptile 5/5. Erik approved on ActivityWatch/activitywatch#1305 at 2026-05-31T12:53:08Z. Waiting for a maintainer with merge permissions. |
Problem
ActivityWatch has several subwatchers (browser, editor) that observe finer-grained
context while a parent app is active. Users want to categorize time by these fields
(
url/$domainfor browser,project/file/languagefor editor), but todaythey never reach the
categorize()pipeline:urlfield is never merged into the categorized window streamconcat, but fabricates phantom events that break app/title/duration aggregationsThe root cause is that
aw_queryhas no field-enrichment-by-overlap primitive — soclient-side code had to express enrichment as
concat(new events) instead of amerge (new fields on existing events).
Solution
Add
merge_subwatcher_fields(base, sub, keys)toaw_transform:the named keys into the base event's
datadictapp/title/duration aggregations stay correct for free
q2_merge_subwatcher_fieldsin the query2 function registryUsage (in a query)
Same pattern for browser url enrichment (closes #352):
Tests
6 new tests covering:
conflict="base_wins"(default): base keys not overwrittenconflict="sub_wins": subwatcher fields winsub/ emptykeysshort-circuitAll 164 existing tests continue to pass.
Related