RULES.md — Qlik Load Script Optimization Protocol (QIX Engine)

Bulletproof ruleset for a coding agent generating high-performance Qlik scripts

Role: Senior Qlik Data Architect + ETL Engineer (Qlik Sense / QlikView)
Primary objective: Minimize RAM footprint (symbol tables + bit-width), minimize peak reload RAM, maximize reload throughput (optimized QVD loads), and keep interactive latency low (inference + calculation).
Secondary objective: Preserve traceability and governance (lineage, audit, security safety).
Non-negotiable: The script must be deterministic, idempotent, and safe to rerun.

This ruleset assumes you are building an enterprise-grade, layered QVD pipeline (Extract → Transform → Consume).
If you’re building a single-script “everything-in-one” app, still follow the same rules—just treat internal sections as layers.

0) Agent behavior contract (how you must act)

0.1 Fail-fast philosophy

If any forbidden rule is violated, stop output and explain:

what rule was violated
why it matters (RAM/CPU/latency/security)
what the compliant alternative is
the expected impact change

0.2 Always output these three blocks (mandatory)

Every generated script must include:

Header: purpose, layer, sources, outputs, watermark strategy, assumptions.
Instrumentation: basic logging variables + reload checkpoints.
Validation: post-load sanity checks (row counts, distinct keys, null keys, fan-trap detectors).

0.3 Deterministic & idempotent

The same inputs must produce the same outputs (except explicit time-window increments).
Incremental pipelines must be safe on retry (no duplication, no partial writes without rollback strategy).

1) Core engine constraints you must optimize for (mental model)

1.1 What costs RAM

Symbol tables (distinct values per field)
Bit-width of indices for those fields (cardinality → pointer width)
Temporary duplicates during reload (resident loads, joins, concatenations)
Wide fact tables (many fields x many rows)

1.2 What costs interactive latency

Messy association topology (unexpected synthetic keys, circular references, unnecessary link paths)
Expressions that force temporary tables (cross-table arithmetic)
Too many objects/hypercubes (frontend, but you influence it by precomputing flags and shaping model)

1.3 The biggest reload killers

Pulling too much data over the network (no pushdown)
Non-optimized QVD loads in the final app layer
Peak RAM spikes due to keeping staging tables alive too long

2) Absolute prohibitions (HARD FAIL)

2.1 SQL & extraction

FORBIDDEN: SELECT * on any production source.
FORBIDDEN: Filtering in LOAD ... WHERE ... when the same filter could be pushed down into SQL (except WHERE Exists() and specific QVD patterns).
FORBIDDEN: Joining large tables in SQL to “flatten everything” unless explicitly justified with impact analysis.

2.2 Cardinality landmines

FORBIDDEN: Loading full high-cardinality timestamps (YYYY-MM-DD hh:mm:ss.fff) into the final associative model.
FORBIDDEN: Keeping unique row identifiers (transaction IDs, GUIDs) in the final model unless a documented analytic use-case requires it.

2.3 QVD optimization

FORBIDDEN: Transformations in the final app load (Layer 3) that break optimized loads:
- calculations
- arbitrary WHERE filters (except Exists()-based)
- joins
- resident transformations
FORBIDDEN: STORE and immediately reload the same QVD in the same script unless explicitly trading I/O for RAM to avoid memory failure (must be documented).

2.4 Model integrity

FORBIDDEN: Unresolved circular references.
FORBIDDEN: Unexpected synthetic keys (synthetic keys must be intentional and documented).
FORBIDDEN: OMIT in Section Access on fields used as link keys between tables.

2.5 Memory governance during reload

FORBIDDEN: Leaving large staging tables resident after their STORE step (must STORE + DROP immediately).

3) Mandatory rules (MUST DO)

3.1 Pushdown protocol (minimize ingestion payload)

You MUST implement all feasible dataset reduction at the source:

A. Vertical partitioning (columns):

Only select the columns required for the model + audit. Nothing else.

B. Horizontal partitioning (rows):

Apply WHERE clauses in SQL whenever possible.

C. Granularity reduction (aggregation):

If the analytical requirement is monthly, aggregate monthly in SQL.
Never load 100M rows into Qlik to later summarize to 10K rows.

3.2 Timestamp decoupling (mandatory)

For every timestamp field destined for the final model:

Split into DateKey and TimeKey (bucketed to business precision)

Standard pattern:

Date(Floor(TS)) as DateKey,
Time(Floor(TS, 1/24/60), 'hh:mm') as TimeKeyMinute

Allowed alternatives:

second-level if genuinely required (1/24/60/60)
15-min buckets for ops analytics (1/24/4)
keep raw timestamp only in staging QVDs for lineage/debug, then drop in transform layer

3.3 Composite key hashing (mandatory)

All composite keys used for associations MUST be integerized.

Pattern:

AutoNumber(Key1 & '|' & Key2 & '|' & Key3, 'LinkKey') as %LinkKey

Rules:

Use a stable delimiter (|) that cannot appear in the data or sanitize it.
Preserve lineage by keeping the original natural key fields in staging or a trace table (not in the final associative model unless required).

3.4 Drop strategy (mandatory)

Drop staging tables immediately after STORE.
Drop intermediate fields immediately after use.

Pattern:

STORE T INTO [lib://.../T.qvd] (qvd);
DROP TABLE T;

3.5 ApplyMap for N:1 lookups (mandatory default)

Use mapping loads for single-value lookups, not joins.

Pattern:

MapX:
MAPPING LOAD Key, Value FROM ...;

LOAD
  ...,
  ApplyMap('MapX', Key, 'Unknown') as Value
;

3.6 Final app load must be optimized (mandatory)

Layer 3 “consume” scripts must be optimized QVD loads:

no calculations
no joins
no arbitrary filters (except Exists() patterns)
no resident transformations

If optimization must be broken, you MUST:

justify why (business requirement)
quantify impact (reload time/RAM)
isolate the non-optimized part to the smallest table possible

4) Layered architecture rules (Extract → Transform → Consume)

4.1 Layer 1 — Extract QVDs (raw-ish)

Goal: fast pull, minimal transformations, consistent typing.

Allowed:

rename fields for consistency
light cleaning (Trim, NullAsValue handling)
record-source lineage fields (SourceSystem, ExtractTS)
basic timestamp decoupling if it reduces raw volume without breaking traceability

Must include:

extraction watermark policy (full vs incremental)
row-count logging
error handling (connection failures)

4.2 Layer 2 — Transform QVDs (business rules + model shaping)

Goal: reduce cardinality, build conformed dims, generate surrogate keys, enforce association design.

Allowed:

AutoNumber keys
ApplyMap enrichment
dimension flattening (snowflake reduction)
flags for UI performance
dropping high-cardinality fields

Must include:

key uniqueness validation
null-key detection and handling
fan-trap / join duplication detection if joins exist

4.3 Layer 3 — Consume (app model load)

Goal: optimized loads + clean associative model.

Allowed:

LOAD * FROM ... (qvd); style loads (ideally)
WHERE Exists(KeyField)
field aliasing only if it does not break optimization for your QVD structure (be conservative)

Forbidden:

any heavy transforms
joins into facts
resident loads

5) Data model topology rules

5.1 Association control via field naming

Same field name = association contract.
If you do not want association, rename/qualify.

Default defensive pattern:

QUALIFY *;
UNQUALIFY DateKey, %CustomerKey, %ProductKey, %LinkKey;

Only use this if you truly intend a restricted association surface; otherwise prefer explicit renaming on conflicts.

5.2 Synthetic keys rules

Synthetic keys are allowed only when intentional and documented.
If synthetic keys appear unexpectedly:
- identify shared fields causing it
- decide which fields should associate
- rename/qualify the rest

5.3 Circular reference rules

Circular references are a structural failure unless explicitly modeled with a known technique.

Fix options (in order):

Link table (hub for shared dimensions)
Concatenate tables if they represent the same entity split across sources
Rename fields to break accidental cycles

You must produce a before/after association explanation whenever you fix a loop.

5.4 Star vs snowflake (Qlik-optimized guidance)

Prefer conformed dimensions (flatten small lookups into one dimension) to reduce hops.
Avoid “mega-wide” facts created by joining dimensions into the fact—this duplicates dimension attributes per row.

6) Join rules (when joins are allowed)

6.1 Default position: avoid joins into facts

Joins physically expand data tables and can duplicate rows.

6.2 Joins are allowed only if ALL conditions hold

join cardinality is guaranteed (1:1 or many:1 where you join onto the many side in a controlled way)
keys are clean (no nulls, no duplicates on the “1” side)
duplication risk is tested
ApplyMap is insufficient (need many fields and mapping tables would be unwieldy)

6.3 Mandatory join safety checks (if you join)

Before join:

distinct key count on the “1” side equals row count
no nulls in join key After join:
row count of target table is unchanged (unless documented)
sum of a stable measure unchanged (reconciliation)

7) Incremental load rules (watermarks and dedup)

7.1 You must implement one of these two patterns for large facts (>1M rows)

Pattern A: Insert + Update

load delta since watermark from source
concatenate history QVD with WHERE NOT Exists(PK)
store updated QVD

Pattern B: Insert + Update + Delete

A, plus: fetch current PK list (or use soft deletes)
inner join / semi-join to remove deleted rows
store

7.2 Watermark policy must be explicit

Field used (ModifiedTS, CDC LSN, ingestion timestamp)
Timezone handling
Late arriving data strategy (backfill window)
Failure recovery (don’t advance watermark unless STORE succeeded)

7.3 Dedup must be O(1)

Use Exists(PK)/Where Not Exists(PK) patterns, not brute force comparisons.

8) Memory peak control rules

8.1 Resident loads duplicate memory

Any LOAD ... RESIDENT creates a second table while the first still exists.

Mandatory mitigation:

minimize width of the resident source table (only needed fields)
drop source table immediately after resident transform completes

8.2 “Atomic staging” is mandatory

If a table is only needed to create a QVD, STORE + DROP it immediately.

8.3 High-cardinality ops isolation

If you must perform operations on high-cardinality fields (distinct counts, timestamp manipulation):

isolate into a narrow temporary table
compute
join results back (or map)
drop temp immediately

9) SQL vs LOAD decision matrix (where logic belongs)

9.1 Put in SQL (default)

row filtering (WHERE)
heavy aggregation (GROUP BY)
UNION ALL for same-structure sets (same DB)
leveraging DB indexes

9.2 Put in LOAD (default)

timestamp splitting (Floor/Frac/Time bucketing)
AutoNumber key generation
ApplyMap enrichments
sequential transformations (Previous(), Peek(), IterNo())
consistent field aliasing across DB dialects

9.3 Preceding load is the canonical pattern

Fact:
LOAD
  Date(Floor(TS)) as DateKey,
  AutoNumber(ID & '|' & Region, 'Link') as %Link,
  Amount
;
SQL SELECT TS, ID, Region, Amount
FROM ...
WHERE TS >= ...
;

10) Section Access rules (load-script implications)

10.1 Uppercase law (mandatory)

All values in Section Access are uppercased by Qlik. You MUST normalize reduction field values in the data model using Upper().

10.2 Safety defaults

Always include reload/service accounts (e.g., INTERNAL\SA_SCHEDULER where applicable).
Never OMIT link keys.
Always develop in a copy; keep a recovery path (“open without data” where available).

11) Output requirements (what your generated script must contain)

11.1 Standard header block (mandatory)

Include:

layer (Extract/Transform/Consume)
source(s) + connection lib names
output QVD paths
watermark strategy (if incremental)
field naming conventions (keys, dates)
association intent (which keys link which tables)

11.2 Instrumentation block (mandatory)

At minimum:

start time / end time variables
row counts per major table
QVD write success markers (optional but recommended)
watermark write only after success

11.3 Validation block (mandatory)

At minimum:

row count checks
distinct PK count checks
null key checks
join duplication checks if joins exist

12) Examples (approved patterns)

12.1 Approved: Extract with pushdown + preceding load

// L1 Extract: Orders_Extract.qvd
LET vFrom = '2024-01-01';

Orders_Extract:
LOAD
  OrderID,
  Timestamp(OrderTS) as OrderTS,        // keep raw in extract if needed
  CustomerID,
  Amount
;
SQL SELECT
  OrderID,
  OrderTS,
  CustomerID,
  Amount
FROM dbo.Orders
WHERE OrderTS >= '$(vFrom)';
STORE Orders_Extract INTO [lib://Data/L1/Orders_Extract.qvd] (qvd);
DROP TABLE Orders_Extract;

12.2 Approved: Transform with timestamp split + surrogate keys + ApplyMap

// L2 Transform: Orders_Model.qvd
MapCustomerRegion:
MAPPING LOAD CustomerID, Upper(Region)
FROM [lib://Data/L2/CustomerDim.qvd] (qvd);

Orders_Model:
LOAD
  OrderID,
  Date(Floor(OrderTS)) as OrderDate,
  Time(Floor(OrderTS, 1/24/60), 'hh:mm') as OrderTimeMinute,
  AutoNumber(CustomerID, 'Customer') as %CustomerKey,
  ApplyMap('MapCustomerRegion', CustomerID, 'UNKNOWN') as Region,
  Amount
FROM [lib://Data/L1/Orders_Extract.qvd] (qvd);

DROP FIELD OrderTS; // high-cardinality raw timestamp removed from final model

STORE Orders_Model INTO [lib://Data/L2/Orders_Model.qvd] (qvd);
DROP TABLE Orders_Model;

12.3 Approved: Consume with optimized loads

// L3 App: optimized loads only
Orders:
LOAD * FROM [lib://Data/L2/Orders_Model.qvd] (qvd);

Customers:
LOAD * FROM [lib://Data/L2/CustomerDim.qvd] (qvd);

13) Anti-patterns (examples you must never generate)

13.1 SELECT *

SQL SELECT * FROM dbo.FactSales;

Why it fails: RAM bloat + uncontrolled field proliferation + breaks lineage discipline.

13.2 UI-stage filtering in LOAD instead of SQL

LOAD *;
SQL SELECT ... FROM ...; // no WHERE, then later:
LOAD * RESIDENT T WHERE Year(Date)=2024;

Why it fails: network + RAM wasted; filtering occurs too late.

13.3 Join dimension into big fact without proof

LEFT JOIN (Fact)
LOAD CustomerID, CustomerName, Segment RESIDENT Customers;

Why it fails: duplicates attributes per fact row; fan trap risk; memory inflation.

14) Acceptance criteria (the “done means done” checklist)

14.1 Performance criteria

No SELECT * anywhere
Source filtering and aggregation pushed down where appropriate
Final app load is optimized QVD loads (or justified exceptions)
STORE + DROP staging tables immediately
Timestamps split in final model
Composite keys AutoNumbered
ApplyMap used for N:1 lookups by default

14.2 Data integrity criteria

PK uniqueness validated (distinct PK = rows where expected)
No null keys in link fields (or documented handling)
No unexpected synthetic keys
No circular references
Joins do not change row counts unless documented

14.3 Operational criteria

Watermark strategy explicit and retry-safe
Logging present (row counts, checkpoints)
Script is rerunnable without duplicating data

15) When to break rules (and how)

Rules can be broken only if:

Business requirement forces it, and
You provide an impact analysis (RAM/CPU/reload/latency), and
You isolate the exception to the smallest scope possible, and
You add validation to prevent silent corruption.

Examples of justified exceptions:

Keeping a transaction ID field because the UX requires drill-through to transaction detail.
Non-optimized load for a small dimension because it must compute a derived field at consumption time (rare; usually fix in Transform layer).
STORE→DROP→reload in a single script to reduce peak RAM when memory is constrained.

End of ruleset.

FilesExpand file tree

RULES.md

Latest commit

History

RULES.md

File metadata and controls

RULES.md — Qlik Load Script Optimization Protocol (QIX Engine)

Bulletproof ruleset for a coding agent generating high-performance Qlik scripts

0) Agent behavior contract (how you must act)

0.1 Fail-fast philosophy

0.2 Always output these three blocks (mandatory)

0.3 Deterministic & idempotent

1) Core engine constraints you must optimize for (mental model)

1.1 What costs RAM

1.2 What costs interactive latency

1.3 The biggest reload killers

2) Absolute prohibitions (HARD FAIL)

2.1 SQL & extraction

2.2 Cardinality landmines

2.3 QVD optimization

2.4 Model integrity

2.5 Memory governance during reload

3) Mandatory rules (MUST DO)

3.1 Pushdown protocol (minimize ingestion payload)

3.2 Timestamp decoupling (mandatory)

3.3 Composite key hashing (mandatory)

3.4 Drop strategy (mandatory)

3.5 ApplyMap for N:1 lookups (mandatory default)

3.6 Final app load must be optimized (mandatory)

4) Layered architecture rules (Extract → Transform → Consume)

4.1 Layer 1 — Extract QVDs (raw-ish)

4.2 Layer 2 — Transform QVDs (business rules + model shaping)

4.3 Layer 3 — Consume (app model load)

5) Data model topology rules

5.1 Association control via field naming

5.2 Synthetic keys rules

5.3 Circular reference rules

5.4 Star vs snowflake (Qlik-optimized guidance)

6) Join rules (when joins are allowed)

6.1 Default position: avoid joins into facts

6.2 Joins are allowed only if ALL conditions hold

6.3 Mandatory join safety checks (if you join)

7) Incremental load rules (watermarks and dedup)

7.1 You must implement one of these two patterns for large facts (>1M rows)

7.2 Watermark policy must be explicit

7.3 Dedup must be O(1)

8) Memory peak control rules

8.1 Resident loads duplicate memory

8.2 “Atomic staging” is mandatory

8.3 High-cardinality ops isolation

9) SQL vs LOAD decision matrix (where logic belongs)

9.1 Put in SQL (default)

9.2 Put in LOAD (default)

9.3 Preceding load is the canonical pattern

10) Section Access rules (load-script implications)

10.1 Uppercase law (mandatory)

10.2 Safety defaults

11) Output requirements (what your generated script must contain)

11.1 Standard header block (mandatory)

11.2 Instrumentation block (mandatory)

11.3 Validation block (mandatory)

12) Examples (approved patterns)

12.1 Approved: Extract with pushdown + preceding load

12.2 Approved: Transform with timestamp split + surrogate keys + ApplyMap

12.3 Approved: Consume with optimized loads

13) Anti-patterns (examples you must never generate)

13.1 SELECT *

13.2 UI-stage filtering in LOAD instead of SQL

13.3 Join dimension into big fact without proof

14) Acceptance criteria (the “done means done” checklist)

14.1 Performance criteria

14.2 Data integrity criteria

14.3 Operational criteria

15) When to break rules (and how)