diff --git a/lessons/02-changing-data/03-upsert/lesson.mdx b/lessons/02-changing-data/03-upsert/lesson.mdx new file mode 100644 index 0000000..de326ba --- /dev/null +++ b/lessons/02-changing-data/03-upsert/lesson.mdx @@ -0,0 +1,126 @@ +Sooner or later you'll write the same buggy code twice: a `SELECT` to check if a row exists, then an `INSERT` or `UPDATE` based on the result. Between those two statements, another connection can do its own check and you get a duplicate-key error — or worse, a duplicate row. + +Postgres' answer is `INSERT … ON CONFLICT`, often called **upsert**. One statement, atomic, race-free. + +The seed has a `page_views` table with `UNIQUE (page, user_email)` and a `tags` table with `UNIQUE name`. Both are realistic shapes for upsert. + +## The problem upsert solves + +"Record a page view: insert a new row at views = 1, or bump the counter if (page, user_email) already exists." + +The naive way is a check-then-write: + +```sql +SELECT views FROM page_views WHERE page = '/home' AND user_email = 'ada@example.com'; +-- If found: UPDATE. If not: INSERT. +``` + +Two round trips, and any concurrent writer can slip between them. `INSERT … ON CONFLICT` collapses both branches into one atomic statement. + +## `ON CONFLICT (...) DO UPDATE` + +The shape: + +```sql +INSERT INTO (cols...) VALUES (...) +ON CONFLICT () DO UPDATE +SET col = ; +``` + +The `conflict_target` is a column or set of columns covered by a `UNIQUE` constraint or primary key — Postgres needs an index to detect the conflict against. Here it's the `(page, user_email)` unique constraint. + + +INSERT INTO page_views (page, user_email, views, last_seen) +VALUES ('/home', 'ada@example.com', 1, now()) +ON CONFLICT (page, user_email) DO UPDATE +SET views = page_views.views + 1, + last_seen = EXCLUDED.last_seen; + + + +Ada already had one view on `/home`. After the upsert, she has two. + + +Two new pieces of syntax: + +- **`page_views.views`** — the *existing* row's value. Qualify with the table name so it doesn't get confused with the incoming column. +- **`EXCLUDED.col`** — the value from the row you tried to `INSERT`. It's a pseudo-table (think "the row that was excluded by the conflict") and it's the bridge between the INSERT side and the UPDATE side. + +So `EXCLUDED.last_seen` says "use the timestamp we just tried to insert" — useful when the new value comes from the caller, not from a computation on the old row. + +## Insert path: same statement, new row + +The same statement also handles the case where no conflict exists — the row just gets inserted. + + +INSERT INTO page_views (page, user_email, views, last_seen) +VALUES ('/pricing', 'newbie@example.com', 1, now()) +ON CONFLICT (page, user_email) DO UPDATE +SET views = page_views.views + 1, + last_seen = EXCLUDED.last_seen; + + + +There was no `(/pricing, newbie@example.com)` row — the upsert inserted it with views = 1. + + +One statement, two branches. The race condition is gone. + +## `ON CONFLICT (...) DO NOTHING` + +Sometimes the "update" branch is "ignore it, you're done". Use `DO NOTHING`. + + +INSERT INTO tags (name) VALUES ('postgres'), ('sql') +ON CONFLICT (name) DO NOTHING; + + + +Both tags already existed. `DO NOTHING` quietly skipped them, so `postgres` still appears exactly once in the table. + + +`DO NOTHING` is great for **idempotent inserts** — re-run the same script and you don't get errors. Common uses: + +- Seeding lookup tables ("ensure these tags exist"). +- Event ingestion with a unique event id ("if we've already processed this id, skip"). +- Migrating data where the source might be replayed. + +If you want to know whether the row was actually inserted, combine `DO NOTHING` with `RETURNING` — only inserted rows come back. + + +INSERT INTO tags (name) VALUES ('postgres'), ('graphql') +ON CONFLICT (name) DO NOTHING +RETURNING id, name; + + +Only `graphql` comes back — `postgres` already existed and was skipped. + +## `WHERE` on the UPDATE branch + +`DO UPDATE` accepts a `WHERE` that filters which conflicting rows actually get updated. "Update only if the incoming value is newer": + +```sql +INSERT INTO page_views (page, user_email, views, last_seen) +VALUES ('/home', 'ada@example.com', 1, '2024-05-01 09:00:00+00') +ON CONFLICT (page, user_email) DO UPDATE +SET last_seen = EXCLUDED.last_seen +WHERE page_views.last_seen < EXCLUDED.last_seen; +``` + +If the incoming `last_seen` is older than the stored one, the conflict matches but the `WHERE` drops the update — the row is left alone. Conflict not handled? Postgres still doesn't raise an error; the row just stays as-is. + +## Common pitfalls + +- **No matching UNIQUE constraint.** Postgres needs an index to detect the conflict. `ON CONFLICT (foo)` fails at planning time if `foo` isn't covered by a primary key or unique index (or partial unique index — `ON CONFLICT (foo) WHERE ` matches a partial unique index). +- **`EXCLUDED` vs the table name.** `EXCLUDED.x` is the incoming row, `tbl.x` is the existing row. Swap them and you'll be writing the old value back over itself. +- **Triggers fire on the path actually taken.** A `BEFORE INSERT` trigger fires when the row is inserted; on the UPDATE path it doesn't. If you rely on `updated_at` triggers, check they handle both. + +## What you learned + +- `INSERT … ON CONFLICT (...) DO UPDATE` collapses check-then-write into one atomic statement. +- The conflict target must be a unique constraint or primary key. +- `EXCLUDED.col` references the incoming row in the UPDATE branch; `tbl.col` is the existing row. +- `ON CONFLICT DO NOTHING` makes inserts idempotent; pair with `RETURNING` to learn which rows actually landed. +- A `WHERE` on the UPDATE branch lets you conditionally update on conflict. + +Up next: wrapping a chunk of work in `BEGIN ... COMMIT` so it either all happens or none of it does. diff --git a/lessons/02-changing-data/03-upsert/lesson.yaml b/lessons/02-changing-data/03-upsert/lesson.yaml new file mode 100644 index 0000000..025b9c3 --- /dev/null +++ b/lessons/02-changing-data/03-upsert/lesson.yaml @@ -0,0 +1,34 @@ +title: Upsert with ON CONFLICT +summary: Insert-or-update in a single statement — INSERT … ON CONFLICT DO UPDATE / DO NOTHING, and how to use EXCLUDED. +estimatedMinutes: 12 +tags: + - insert + - on-conflict + - upsert + - excluded + - dml +authors: + - exekias +seed: seed.sql +checks: + - id: ada-page-views-incremented + type: query-returns + description: Upserting Ada's page view brings her count to 2. + sql: SELECT views FROM page_views WHERE page = '/home' AND user_email = 'ada@example.com' + expect: + rowCount: 1 + rows: [[2]] + - id: new-page-view-inserted + type: query-returns + description: Upserting a brand-new (page, user_email) pair inserts it with views = 1. + sql: SELECT views FROM page_views WHERE page = '/pricing' AND user_email = 'newbie@example.com' + expect: + rowCount: 1 + rows: [[1]] + - id: dedup-do-nothing + type: query-returns + description: DO NOTHING swallowed the duplicate — 'postgres' still appears exactly once. + sql: SELECT count(*)::int FROM tags WHERE name = 'postgres' + expect: + rowCount: 1 + rows: [[1]] diff --git a/lessons/02-changing-data/03-upsert/seed.sql b/lessons/02-changing-data/03-upsert/seed.sql new file mode 100644 index 0000000..9f57112 --- /dev/null +++ b/lessons/02-changing-data/03-upsert/seed.sql @@ -0,0 +1,30 @@ +-- Seed for "03-upsert": two tables that benefit from upsert. +-- page_views is the "increment a counter, creating it if needed" example — +-- it has a UNIQUE (page, user_email) so the conflict target is meaningful. +-- tags is a tiny dedup table for DO NOTHING. + +CREATE TABLE page_views ( + id serial PRIMARY KEY, + page text NOT NULL, + user_email text NOT NULL, + views int NOT NULL DEFAULT 1, + last_seen timestamptz NOT NULL DEFAULT now(), + UNIQUE (page, user_email) +); + +INSERT INTO page_views (page, user_email, views, last_seen) VALUES + ('/home', 'ada@example.com', 1, '2024-05-01 09:00:00+00'), + ('/home', 'grace@example.com', 4, '2024-05-02 11:30:00+00'), + ('/pricing', 'grace@example.com', 2, '2024-05-03 12:00:00+00'), + ('/blog', 'linus@example.com', 7, '2024-05-04 18:45:00+00'); + +CREATE TABLE tags ( + id serial PRIMARY KEY, + name text NOT NULL UNIQUE +); + +INSERT INTO tags (name) VALUES + ('postgres'), + ('sql'), + ('database'), + ('tutorial');