Extract @meta and object_provides from idx JSONB into dedicated columns

## Problem

The `idx` JSONB column carries significant overhead from keys that are never used for index queries but bloat every row. Measurements on a production instance (AAF, 2.6M rows, 137k cataloged):

| Key | avg size | total est | % of idx |
|---|---|---|---|
| `@meta` (mainly `image_scales`) | 2,810 B | **368 MB** | 87% |
| `object_provides` | 1,634 B | **214 MB** | 50% |
| Everything else | ~400 B | ~50 MB | 12% |

The total `idx` size (TOAST-compressed) is **424 MB**. Extracting these keys would reduce it to ~50–60 MB (**~85% reduction**), bringing the average row size under the TOAST threshold (~2 KB).

### Why these three?

- **`@meta`**: Contains non-JSON-native metadata (pickled via `zodb_json_codec`). The dominant entry is `image_scales` (~315 MB). This data is never queried via SQL — it's only read by brains for metadata access. It doesn't belong in the indexed JSONB.

- **`object_provides`**: A `KeywordIndex` stored as a JSON array. Queried on almost every catalog search (interface filtering). A native `TEXT[]` column with GIN index is more efficient than `jsonb->'key' ?|` on the JSONB GIN.

- **`allowedRolesAndUsers`**: Already partially extracted — `allowed_roles TEXT[]` column and GIN index exist, and queries already use them. But the key is currently **copied** to the column, not **popped** from idx, so it's stored twice. This plan subsumes the existing hardcoded handling into the generic mechanism and removes the redundancy.

## Proposed Solution

### Generic `ExtraIdxColumn` mechanism

Instead of hardcoding per-field extraction, introduce a declarative registry:

```python
@dataclasses.dataclass(frozen=True)
class ExtraIdxColumn:
    idx_key: str        # key in idx dict (e.g. "@meta", "object_provides")
    column_name: str    # PG column name (e.g. "meta", "object_provides")
    column_type: str    # PG type (e.g. "JSONB", "TEXT[]")
    value_expr: str     # SQL value expression for psycopg
    gin_index: bool = False
```

Registered columns are:
1. **Popped from idx** at write time (in `CatalogStateProcessor.process()` and `indexing.py`)
2. **Written to dedicated PG columns** via existing `ExtraColumn` infrastructure in zodb-pgjsonb
3. **Queried directly** — `_handle_keyword()` checks for dedicated columns (replaces hardcoded `allowedRolesAndUsers` check)
4. **Read by brains** — `_resolve_from_idx()` checks the `meta` column first, falls back to `idx["@meta"]` for pre-migration data

### Default registrations

```python
ExtraIdxColumn(idx_key="@meta", column_name="meta", column_type="JSONB", ...)
ExtraIdxColumn(idx_key="object_provides", column_name="object_provides", column_type="TEXT[]", gin_index=True)
ExtraIdxColumn(idx_key="allowedRolesAndUsers", column_name="allowed_roles", column_type="TEXT[]", gin_index=True)
```

### DDL

```sql
ALTER TABLE object_state ADD COLUMN IF NOT EXISTS meta JSONB;
ALTER TABLE object_state ADD COLUMN IF NOT EXISTS object_provides TEXT[];
CREATE INDEX IF NOT EXISTS idx_os_object_provides
    ON object_state USING gin (object_provides) WHERE object_provides IS NOT NULL;
-- allowed_roles column and idx_os_allowed_roles GIN index already exist
```

## Affected Code Paths

| File | Change |
|---|---|
| `columns.py` | `ExtraIdxColumn` dataclass + registry + default registrations (incl. `allowedRolesAndUsers`) |
| `schema.py` | DDL for new columns + GIN index (`allowed_roles` already exists) |
| `processor.py` | `get_extra_columns()` declares new `ExtraColumn`s; `process()` pops keys from idx (replaces hardcoded `allowed_roles` extraction) |
| `query.py` | `_handle_keyword()` uses generic column lookup (replaces hardcoded `if idx_key == "allowedRolesAndUsers"` check) |
| `brain.py` | `_resolve_from_idx()` checks `meta` column first, fallback to `idx["@meta"]` |
| `search.py` | Include `meta` in SELECT columns |
| `indexing.py` | Replace hardcoded `allowed_roles` extraction with generic `_extract_extra_columns()` |
| `startup.py` | Remove `_backfill_allowed_roles()` (superseded by generic mechanism + `clear_and_rebuild`) |

## Migration

- DDL uses `IF NOT EXISTS` — safe for rolling deploys
- Pre-migration rows: `meta = NULL`, `object_provides = NULL`, data still in `idx`
- `allowed_roles` likely already populated by old backfill mechanism
- Brain fallback handles both old and new layout transparently
- After deploy: run `clear_and_rebuild` to populate new columns for all objects and pop keys from idx
- The generic mechanism makes future extractions trivial

## Detailed Implementation Plan

A full task-by-task plan with tests and code is available at:
[docs/plans/2026-04-09-pgcatalog-idx-column-extraction.md](https://github.com/bluedynamics/plone-pgcatalog/blob/main/docs/plans/2026-04-09-pgcatalog-idx-column-extraction.md)

## Expected Impact

| Metric | Before | After |
|---|---|---|
| idx size per row (avg) | 3,239 B | ~400 B (under TOAST threshold) |
| idx total (compressed) | 424 MB | ~50–60 MB |
| `object_provides` query | JSONB GIN (`idx->'key' ?|`) | Native `TEXT[]` GIN (faster) |
| `allowedRolesAndUsers` in idx | Copied (redundant) | Popped (no duplication) |
| Brain metadata access | Decode from idx JSONB | Direct from `meta` column (same codec) |
| Code complexity | 3 hardcoded special cases | 1 generic mechanism |

File	Change
`columns.py`	`ExtraIdxColumn` dataclass + registry + default registrations (incl. `allowedRolesAndUsers`)
`schema.py`	DDL for new columns + GIN index (`allowed_roles` already exists)
`processor.py`	`get_extra_columns()` declares new `ExtraColumn`s; `process()` pops keys from idx (replaces hardcoded `allowed_roles` extraction)
`query.py`	`_handle_keyword()` uses generic column lookup (replaces hardcoded `if idx_key == "allowedRolesAndUsers"` check)
`brain.py`	`_resolve_from_idx()` checks `meta` column first, fallback to `idx["@meta"]`
`search.py`	Include `meta` in SELECT columns
`indexing.py`	Replace hardcoded `allowed_roles` extraction with generic `_extract_extra_columns()`
`startup.py`	Remove `_backfill_allowed_roles()` (superseded by generic mechanism + `clear_and_rebuild`)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract @meta and object_provides from idx JSONB into dedicated columns #98

Problem

Why these three?

Proposed Solution

Generic `ExtraIdxColumn` mechanism

Default registrations

DDL

Affected Code Paths

Migration

Detailed Implementation Plan

Expected Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Key	avg size	total est	% of idx
`@meta` (mainly `image_scales`)	2,810 B	368 MB	87%
`object_provides`	1,634 B	214 MB	50%
Everything else	~400 B	~50 MB	12%

Metric	Before	After
idx size per row (avg)	3,239 B	~400 B (under TOAST threshold)
idx total (compressed)	424 MB	~50–60 MB
`object_provides` query	JSONB GIN (`idx->'key' ?	`)
`allowedRolesAndUsers` in idx	Copied (redundant)	Popped (no duplication)
Brain metadata access	Decode from idx JSONB	Direct from `meta` column (same codec)
Code complexity	3 hardcoded special cases	1 generic mechanism

Extract @meta and object_provides from idx JSONB into dedicated columns #98

Description

Problem

Why these three?

Proposed Solution

Generic ExtraIdxColumn mechanism

Default registrations

DDL

Affected Code Paths

Migration

Detailed Implementation Plan

Expected Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Generic `ExtraIdxColumn` mechanism