fix(experimental): NewScopedClient, AddContext, OverwriteContextByKind emit usage events#310
Conversation
keelerm84
left a comment
There was a problem hiding this comment.
Did you check with the event team to make sure they weren't considered about the volume of events?
(I don't speak for the event ingestion team - cc @rengawm @karayusuf - but I think the following is correct) I think the incremental volume is not a concern because this doesn't add any event emission to existing code paths, only to this new usage pattern, so only customers who
We should keep an eye on adoption (using this mechanism!) and if this does get serious adoption then we should prioritize those optimizations you mention. I have a few other thoughts not directly related to volume, coming in subsequent comments. |
| // not use it. You have been warned. | ||
| func NewScopedClient(client *LDClient, contexts ...ldcontext.Context) *LDScopedClient { | ||
| func NewScopedClient(client *LDClient, context ldcontext.Context) *LDScopedClient { | ||
| _ = client.TrackData("$ld:scoped:usage", context, ldvalue.String("new")) |
There was a problem hiding this comment.
If we send this telemetry using Track* (i.e. as custom events) then they will be visible to the customer - is that intended and desirable? e.g. they will show up in the customer's Event Explorer and Live Events.
Besides showing our laundry, I'd worry that the proliferation of these "meta" events might clutter those views and get in the way of customers seeing the actual events they are sending.
I don't think that's a deal breaker since it'll only affect customers who start using this, but it might be something we could improve on for v2.
As an alternative did you look at all about extending the "diagnostic events" that the SDK already sends (that I believe already has a summarization mechanism)? We'd need to check with Data Platform (@karayusuf and @rengawm again) but I think I heard they had plans to (or already were?) ingest diagnostic events into Clickhouse, where it would be straightforward to query them for this data.
There was a problem hiding this comment.
I agree I'm not a fan of the showing our laundry here. @wchieng mentioned this is reasonable for now and in line with the metrics team's vision, and in the future we can build filters so that the laundry isn't shown, thanks to the hardcoded keys here. This follows a precedent set in a similar AI usage event.
Some other alternatives I considered:
- Diagnostic events - this would be ideal end result because my use case lines up with what those were made for, but it would take a lot of refactoring to plumb this info through to the part of the SDK that sends the diagnostic events
- Summary events - I could fairly easily abuse the part of the SDK that sends
summaryevents, to send some summary events about scoped client usage by recording an "evaluation" with a "flag key" that's actually a special key like$ld:scoped:usage. Main concern with this are that it's pretty ugly & unusual, and I'm not totally sure about our clickhouse capability to use summary events as internal diagnostics like this.
There was a problem hiding this comment.
Although the ingest volume is probably not concerning (per other comment), one issue I could see is if a customer could clog up their SDK event queue with these "meta" events. That'd be especially unfortunate since the event queue doesn't have a sense of priority, so we could end up dropping important events like RG/experiment measurements.
It looks like the number of events is roughly one per "scope" (= per request, in most cases?) per context kind, which isn't excessive, but I believe is more than what a server-side SDK would currently be sending - 1 index event per multi-context rather than per context kind, and per dedupe interval rather than per request. So a server-side SDK handling a bunch of requests concurrently might hit the max event queue size sooner than it would without this.
Again, not a deal breaker since this is limited to usage of this new feature.
There was a problem hiding this comment.
As you pointed out, both of the alternatives I mentioned have the advantage of summarizing the data instead of having distinct events to send about this usage, which is all I really need for basic usage tracking. However, the downsides felt significant enough to go either route.
There was a problem hiding this comment.
The summary event idea is interesting. I think it would work in Clickhouse if done as you describe (fake flag key).
I think that it would be hard for customers to encounter the fake flag key, since they can't (e.g.) navigate to the flag page for the fake flag in order to see the fake evals chart.
A downside of that way would be you wouldn't get the granular events per added context. If you don't need those anyway, maybe you could do your own mini summarization in the method and just send one event per call to AddContext (with ctx being the current state of the context being built?) and a value of the number of contexts added. That would cut down the event amplification factor a bit, although they could still just call AddContext a bunch of times.
| continue | ||
| } | ||
| c.addIndividualContext(ctx) | ||
| _ = c.client.TrackData("$ld:scoped:usage", ctx, ldvalue.String("add")) |
There was a problem hiding this comment.
How are you planning to use the data value (this string)? If you're just going to query these events in Clickhouse/Athena this is fine. I think our "metric filters" project might only work for map-valued data (e.g. {"operation":"add"} rather than just "add"), but that would only be if we were creating a customer-visible metric for some of these events, which presumably we don't want to do.
There was a problem hiding this comment.
I stole the idea of the single string from launchdarkly/python-server-sdk-ai#60. I figured this only needs to be distinct enough that any hand-rolled queries in Clickhouse/Athena can use it, but yeah, as you said, I'm not targeting our actual product ways of diving into this data.
There was a problem hiding this comment.
It would be pretty neat to extend diagnostic events to allow some external events into them. The idea with these is just to do them until we have an idea that customers are actually getting value from things. Then we should prune them.
So it would probably be good to make a task to come back and prune this once we are happy with it.
samstokes
left a comment
There was a problem hiding this comment.
Given that this is a temporary mechanism we don't need to leave in the SDK long term, and this is important functionality to get in customers' hands and get feedback on, I think this is fine to move forward with.
bbd0c91 to
7edbef5
Compare
🤖 I have created a release *beep* *boop* --- ## [7.13.3](v7.13.2...v7.13.3) (2025-08-20) ### Bug Fixes * **experimental:** NewScopedClient can now only take one context argument ([#309](#309)) ([b149429](b149429)) * **experimental:** NewScopedClient, AddContext, OverwriteContextByKind emit usage events ([#310](#310)) ([4a36949](4a36949)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Emit custom events
$ld:scoped:usagewheneverNewScopedClient,AddContext,OverwriteContextByKindare called.This will give us insight into usage patterns of this tool.
A future optimization should be to either sample these events or use a summarizer, depending on the demand for ScopedClient in super-high-volume applications.