Add docs on person processing #14512

robbie-c · 2026-01-22T13:36:26Z

Changes

Adds a section to the engineering handbook about person processing.

As this is an area that touches capture, ingestion, clickhouse, hogql and queries, the whole system is not owned by any one team. To that end, I thought it would be useful to provide a high-level picture of how pieces fit together.

Triggered by https://posthog.slack.com/archives/C08JQTX5RRP/p1767878236120559

Checklist

Words are spelled using American English
PostHog product names are in title case. It's "Product Analytics" not "Product analytics". If talking about a category of product, use sentence case e.g. "There are a lot of product analytics tools, but PostHog's Product Analytics is the best"
Titles are in sentence case
Feature names are in sentence case. It's "Click here to create a trend insight" not "... create a Trend Insight" and so on.
Use relative URLs for internal links
If I moved a page, I added a redirect in vercel.json
Remove this template if you're not going to fill it out!

Article checklist

I've added (at least) 3-5 internal links to this new article
I've added keywords for this page to the rank tracker in Ahrefs
I've checked the preview build of the article
The date on the article is today's date
I've added this to the relevant "Tutorials and guides" docs page (if applicable)

vercel · 2026-01-22T13:36:32Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
posthog	Error		Jan 23, 2026 11:38am

contents/handbook/engineering/person-processing.md

pauldambra

amazing

contents/handbook/engineering/person-processing.md

Co-authored-by: Paul D'Ambra <paul@posthog.com>

vdekrijger

Amazing, great read and also helped me better understand the PoE thing you want to look into 🙌 !

contents/handbook/engineering/person-processing.md

aspicer · 2026-01-22T23:16:08Z

contents/handbook/engineering/person-processing.md

+
+A `distinct_id` is an identifier attached to every event. It's how we know which person an event belongs to. A person can have multiple distinct IDs (e.g., an anonymous session ID and a logged-in user ID).
+
+Some example Distinct ID formats are: the user's email address, a UUID randomly generated by a client SDK, the primary key id in the customer's `User` table in their database, a Stripe `cus_xxx` ID.


Suggested change

Some example Distinct ID formats are: the user's email address, a UUID randomly generated by a client SDK, the primary key id in the customer's `User` table in their database, a Stripe `cus_xxx` ID.

Some commonly used Distinct ID formats are: the user's email address, a UUID randomly generated by a client SDK, the primary key id in the customer's `User` table in their database, a Stripe `cus_xxx` ID.

gesh

Awesome! Thank you for putting all the knowledge in one place!

gesh · 2026-01-23T09:53:09Z

contents/handbook/engineering/person-processing.md

+(Cookieless events use a placeholder distinct ID, which is replaced later with a privacy-preserving hash. The placeholder is not suitable as a partioning key, as it is always the same value for every cookieless event, so IP address is used)
+
+**Implications**:
+- Events with the **same** distinct_id go to the **same** Kafka partition → ordering preserved


Can the events order be changed before inserting them into Kafka?
For example:

We have Event A and Event B (in this order).

They are sent in two separate calls to /capture endpoint

Event A is slowly processed by one Rust process

In parallel, Event B is processed faster in another Rust process

Event B is ingested into the Kafka topic

Event A is ingested into the Kafka topic

If that's true, and we have $identify -> customEvent, but the customEvent is processed first, will we set the correct person_id to it. customEvent has the identify uuid, which is different compared to the anon user uuid, and we haven't created a person for it?

in a previous job we processed data with a sliding window to re-order it
but it was very expensive

i think the reason we have the confusing squash/override/etc is to keep ingestion cheap

(mostly commenting so i get a notification when the correct answer appears)

It makes sense to me that this could happen, but I would probably defer to @PostHog/team-ingestion

Do we detect that this happened when event A is processed, and spit out an override?

gesh · 2026-01-23T09:56:07Z

contents/handbook/engineering/person-processing.md

+
+---
+
+## System overview


* edits * progress * processing

Add docs on person processing

9cc1574

tidy overrides section

0a97114

robbie-c requested review from a team January 22, 2026 13:43

pauldambra reviewed Jan 22, 2026

View reviewed changes

contents/handbook/engineering/person-processing.md Outdated Show resolved Hide resolved

pauldambra approved these changes Jan 22, 2026

View reviewed changes

vercel bot deployed to Preview January 22, 2026 14:02 View deployment

robbie-c commented Jan 22, 2026

View reviewed changes

contents/handbook/engineering/person-processing.md Outdated Show resolved Hide resolved

contents/handbook/engineering/person-processing.md Show resolved Hide resolved

robbie-c and others added 3 commits January 22, 2026 14:19

Add note on cookieless events

4af709f

Fix code locations

f6bf912

Update contents/handbook/engineering/person-processing.md

4ea9531

Co-authored-by: Paul D'Ambra <paul@posthog.com>

vercel bot deployed to Preview January 22, 2026 14:35 View deployment

vdekrijger approved these changes Jan 22, 2026

View reviewed changes

andyzzhao reviewed Jan 22, 2026

View reviewed changes

contents/handbook/engineering/person-processing.md Outdated Show resolved Hide resolved

robbie-c commented Jan 22, 2026

View reviewed changes

contents/handbook/engineering/person-processing.md Outdated Show resolved Hide resolved

Apply suggestion from @robbie-c

ad0cb95

robbie-c commented Jan 22, 2026

View reviewed changes

contents/handbook/engineering/person-processing.md Show resolved Hide resolved

Apply suggestion from @robbie-c

0cda21c

vercel bot deployed to Preview January 22, 2026 15:21 View deployment

aspicer reviewed Jan 22, 2026

View reviewed changes

contents/handbook/engineering/person-processing.md Show resolved Hide resolved

aspicer reviewed Jan 22, 2026

View reviewed changes

gesh approved these changes Jan 23, 2026

View reviewed changes

proposed changes (#14526)

459edc2

* edits * progress * processing

vercel bot had a problem deploying to Preview January 23, 2026 11:38 Failure


		A `distinct_id` is an identifier attached to every event. It's how we know which person an event belongs to. A person can have multiple distinct IDs (e.g., an anonymous session ID and a logged-in user ID).

		Some example Distinct ID formats are: the user's email address, a UUID randomly generated by a client SDK, the primary key id in the customer's `User` table in their database, a Stripe `cus_xxx` ID.

	Some example Distinct ID formats are: the user's email address, a UUID randomly generated by a client SDK, the primary key id in the customer's `User` table in their database, a Stripe `cus_xxx` ID.
	Some commonly used Distinct ID formats are: the user's email address, a UUID randomly generated by a client SDK, the primary key id in the customer's `User` table in their database, a Stripe `cus_xxx` ID.

Add docs on person processing #14512

Are you sure you want to change the base?

Add docs on person processing #14512

Conversation

robbie-c commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Checklist

Article checklist

Uh oh!

vercel bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pauldambra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vdekrijger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aspicer Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

gesh left a comment

Choose a reason for hiding this comment

Uh oh!

gesh Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

pauldambra Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

pauldambra Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

robbie-c Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gesh Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

robbie-c commented Jan 22, 2026 •

edited

Loading

vercel bot commented Jan 22, 2026 •

edited

Loading

robbie-c Jan 23, 2026 •

edited

Loading