Skip to content

Use PostHog SDK for self-hosted telemetry#145

Open
suguanYang wants to merge 5 commits into
mainfrom
fix/wangbinqi/posthog-python-sdk
Open

Use PostHog SDK for self-hosted telemetry#145
suguanYang wants to merge 5 commits into
mainfrom
fix/wangbinqi/posthog-python-sdk

Conversation

@suguanYang

@suguanYang suguanYang commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • replace the raw PostHog /batch HTTP transport with the official posthog Python SDK behind the existing TelemetryClient wrapper
  • keep Knowhere's bounded async queue, event-name allowlist, property sanitizer, stable installation distinct_id, $process_person_profile=false, and disable_geoip=True
  • add anonymous self-hosted aggregate telemetry for usage, retrieval, worker, API, and provider activity:
    • self_hosted_usage_aggregate
    • self_hosted_retrieval_aggregate
    • self_hosted_worker_aggregate
    • self_hosted_api_aggregate
    • self_hosted_provider_aggregate
  • collect only aggregate counters/latency buckets and keep raw user IDs, document names, queries, prompts, responses, keys, and model names out of event payloads
  • add bounded process-local API request metrics and DB-backed aggregate collectors that continue to fail closed if telemetry collection fails
  • add SDK and aggregate-focused contract tests using a fake PostHog client instead of asserting raw HTTP payloads

Verification

  • uv run pytest apps/api/tests/contract/test_self_hosted_telemetry_contract.py -q
  • uv run pytest apps/api/tests/contract/test_s3_event_contract.py apps/api/tests/contract/test_self_hosted_telemetry_contract.py -q
  • make lint
  • make typecheck
  • git diff --check
  • Built knowhere-self-hosted:telemetry-aggregates from this runtime branch plus the self-hosted PR source bundle, started the compose stack, verified /health and dashboard /login, registered a guest, created/uploaded a text job through LocalStack SNS, waited for the job to complete, waited for the next aggregate interval, and stopped the app to flush telemetry.

Comment thread packages/shared-python/shared/services/telemetry/client.py Fixed
Comment thread packages/shared-python/shared/services/telemetry/client.py Fixed
Comment thread packages/shared-python/shared/services/telemetry/client.py Fixed
Comment thread packages/shared-python/shared/services/telemetry/aggregates.py Fixed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants