Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -103,4 +103,4 @@ ORDER BY signed_up_at;
Mark this lesson done — we'll just confirm the sandbox is healthy.
</Check>

Up next: aggregating rows together with `GROUP BY`.
Up next: sorting results predictably and paging through them without falling into the `OFFSET` trap.
129 changes: 129 additions & 0 deletions lessons/01-query-fundamentals/03-sorting-and-pagination/lesson.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
Lesson 01 introduced `ORDER BY` and `LIMIT`. This lesson goes deeper: stable sorts, dropping duplicates, and the two ways to page through a result set — including why one of them quietly breaks in production.

The seed loaded an `articles` table — 30 rows, several authors, and a few intentional ties on `published_at`.

## `ORDER BY`: pick a direction

`ORDER BY col` sorts ascending (small to large, oldest first). Add `DESC` to flip it.

<Run>
SELECT title, views
FROM articles
ORDER BY views DESC
LIMIT 5;
</Run>

The top-5 most-viewed posts. Without `LIMIT 5`, you'd get all 30, sorted.

## Tie-breakers

What happens when two rows have the same sort key? Postgres returns them in *some* order — but which one is implementation-defined. If the order matters, spell out a tie-breaker.

<Run>
SELECT title, author, published_at
FROM articles
ORDER BY published_at, id;
</Run>

`published_at` has duplicates (Ada's two articles, Grace's first two — see the seed). Adding `id` as a second sort key makes the result *deterministic*: same query, same order, every time. For pagination this isn't optional — it's a correctness requirement, as we'll see in a minute.

## `NULLS FIRST` / `NULLS LAST`

Postgres puts `NULL`s **last** in ascending sorts and **first** in descending sorts. Override with `NULLS FIRST` or `NULLS LAST` when you want the opposite — most often when you want "newest first, but missing dates at the bottom".

<Run>
SELECT title, published_at
FROM articles
ORDER BY published_at DESC NULLS LAST;
</Run>

The seed now includes a couple of `NULL` `published_at` values so you can see this directly.

## `DISTINCT`: drop duplicate rows

`DISTINCT` removes duplicate rows from the result. It's a post-processing step on whatever the `SELECT` list produced.

<Run>
SELECT DISTINCT author
FROM articles
ORDER BY author;
</Run>

Twelve articles came from a handful of repeat authors — `DISTINCT` collapses them. Note `DISTINCT` is *across all selected columns*, not just one: `SELECT DISTINCT author, published_at` would keep two rows from the same author on different days.

### `DISTINCT ON (...)`: one row per group, Postgres-flavored

`DISTINCT ON (col)` is a Postgres extension: "one row per distinct value of `col`, and you pick which one with `ORDER BY`". Handy for "the latest article per author":

<Run>
SELECT DISTINCT ON (author) author, title, published_at
FROM articles
ORDER BY author, published_at DESC;
</Run>

The first column(s) in the `ORDER BY` must match the `DISTINCT ON` list — that's the rule that lets Postgres pick "the first row per group". Inside each author, `published_at DESC` chooses the newest.

## `LIMIT` and `OFFSET`: the obvious way to paginate

`LIMIT N OFFSET M` says "skip M rows, then return N". The classic page-2-of-10 query:

<Run>
SELECT id, title, published_at
FROM articles
ORDER BY published_at DESC NULLS LAST, id DESC
LIMIT 10 OFFSET 10;
</Run>

That's page 2 (rows 11–20). Page 3 would be `OFFSET 20`. Simple, and the right tool for small result sets.

## The `OFFSET` trap

`OFFSET M` makes Postgres fetch *and discard* M rows before returning anything. On page 1 that's free. On page 1000 of a million-row feed, you're scanning a million rows to throw away 999,990 of them.

There's a subtler bug too: if a new row gets inserted between requesting page 1 and page 2, page 2 will repeat a row from page 1 (because everything shifted down by one). The result set isn't stable across requests.

For small admin tables, `OFFSET` is fine. For user-facing feeds, infinite scroll, or anything that paginates deeply, reach for keyset pagination.

## Keyset pagination: page by `WHERE`

Idea: instead of "skip 10,000 rows", remember the *last row you saw* and ask for "rows after that one". With a deterministic `ORDER BY`, that's just a `WHERE` clause.

Page 1:

<Run>
SELECT id, title, published_at
FROM articles
ORDER BY published_at DESC NULLS LAST, id DESC
LIMIT 5;
</Run>

Note the last row's `published_at` and `id`. To get the next page, plug them into a `WHERE` filter that asks for everything strictly after that key:

<Run>
SELECT id, title, published_at
FROM articles
WHERE (published_at, id) < ('2024-06-24 08:30:00+00', 26)
ORDER BY published_at DESC NULLS LAST, id DESC
LIMIT 5;
</Run>

Two important details:

1. **The tuple comparison `(a, b) < (x, y)`** does lexicographic ordering — `a < x`, OR `a = x AND b < y`. That's exactly the tie-breaker logic we wrote into `ORDER BY`. They have to match.
2. **No `OFFSET`**. Each page is a fresh `WHERE` lookup that an index on `(published_at DESC, id DESC)` can serve in constant time, no matter how deep you go.

The downside: you can't jump to "page 42" — you walk forward one page at a time. For feeds and infinite scroll that's fine; for an admin grid with a page picker, `OFFSET` is the easier fit.

## What you learned

- `ORDER BY` sorts; add `DESC` and `NULLS FIRST`/`LAST` as needed.
- Always include a tie-breaker (typically the primary key) for deterministic order.
- `DISTINCT` drops duplicate rows; `DISTINCT ON (col)` picks one row per group, chosen by `ORDER BY`.
- `LIMIT N OFFSET M` is the obvious way to paginate — and gets slow and unstable on deep pages.
- Keyset pagination (`WHERE (key) < (last_seen)`) pages in constant time and survives concurrent inserts.

<Check id="seed-loaded">
Mark this lesson done — we'll just confirm the sandbox is healthy.
</Check>

Up next: collapsing rows into summaries with aggregations and `GROUP BY`.
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
title: Sorting and pagination
summary: Order results predictably, drop duplicates with DISTINCT, and page through with LIMIT/OFFSET — and why keyset pagination is the better default.
estimatedMinutes: 12
tags:
- order-by
- distinct
- limit
- offset
- pagination
authors:
- exekias
seed: seed.sql
checks:
- id: seed-loaded
type: row-count
description: The seeded articles table has 30 rows — click to mark this lesson done.
table: articles
expect:
rowCount: 30
43 changes: 43 additions & 0 deletions lessons/01-query-fundamentals/03-sorting-and-pagination/seed.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
-- Seed for "03-sorting-and-pagination": a small articles feed with a few
-- intentional ties on published_at so ORDER BY tie-breakers are visible, and
-- some duplicate authors so DISTINCT has something to do.

CREATE TABLE articles (
id serial PRIMARY KEY,
title text NOT NULL,
author text NOT NULL,
views int NOT NULL,
published_at timestamptz
);

INSERT INTO articles (title, author, views, published_at) VALUES
('Indexing for humans', 'Ada Lovelace', 1200, '2024-01-05 09:00:00+00'),
('The case for MVCC', 'Alan Turing', 890, '2024-01-05 09:00:00+00'),
('Why your query is slow', 'Grace Hopper', 4300, '2024-01-12 14:00:00+00'),
('EXPLAIN, explained', 'Grace Hopper', 3100, '2024-01-20 10:30:00+00'),
('B-trees from first principles','Donald Knuth', 2750, '2024-01-28 08:15:00+00'),
('Postgres tips, vol 1', 'Ada Lovelace', 640, '2024-02-04 16:45:00+00'),
('Postgres tips, vol 2', 'Ada Lovelace', 720, '2024-02-11 16:45:00+00'),
('Joins by example', 'Linus Torvalds', 1810, '2024-02-19 12:00:00+00'),
('LATERAL is fine, actually', 'Barbara Liskov', 980, '2024-02-26 11:00:00+00'),
('When to use JSONB', 'Guido van Rossum', 2210, '2024-03-04 09:30:00+00'),
('When not to use JSONB', 'Guido van Rossum', 1560, '2024-03-11 09:30:00+00'),
('Window functions in anger', 'Margaret Hamilton', 3380, '2024-03-18 13:20:00+00'),
('Reading EXPLAIN ANALYZE', 'Grace Hopper', 4710, '2024-03-25 13:20:00+00'),
('Vacuum and bloat', 'Dennis Ritchie', 430, '2024-04-01 07:00:00+00'),
('A small note on COLLATE', 'Bjarne Stroustrup', 210, NULL),
('GIN vs GiST', 'Donald Knuth', 1990, '2024-04-15 10:00:00+00'),
('Trigram search basics', 'Ken Thompson', 870, '2024-04-22 10:00:00+00'),
('CTEs are not optimization fences anymore','Linus Torvalds', 2540, '2024-04-29 15:15:00+00'),
('Schema migrations without tears','Barbara Liskov', 3050, '2024-05-06 11:45:00+00'),
('Idempotent INSERTs with ON CONFLICT','Margaret Hamilton', 2890, '2024-05-13 11:45:00+00'),
('Three flavors of UUID', 'Edsger Dijkstra', 760, '2024-05-20 09:00:00+00'),
('Counting is harder than it looks','Ada Lovelace', 1680, '2024-05-27 14:30:00+00'),
('Pagination, the LIMIT/OFFSET trap','Grace Hopper', 5120, '2024-06-03 14:30:00+00'),
('Pagination, the keyset way', 'Grace Hopper', 4870, '2024-06-10 14:30:00+00'),
('Date math in Postgres', 'Guido van Rossum', 930, '2024-06-17 08:30:00+00'),
('Time zones, again', 'Bjarne Stroustrup', 410, '2024-06-24 08:30:00+00'),
('Generated columns: hidden gems','Dennis Ritchie', 1240, '2024-07-01 17:00:00+00'),
('Foreign keys revisited', 'Ken Thompson', 1080, '2024-07-08 17:00:00+00'),
('Locking, lightly', 'Edsger Dijkstra', 1340, '2024-07-15 12:30:00+00'),
('How autovacuum keeps you sane','Margaret Hamilton', 980, NULL);
Loading
Loading