Assertion Catalog (Statistical Oracles)

Quantum measurement outcomes are samples from a distribution. An assertion is a statistical oracle: it decides whether the observed sample is consistent with an expected distribution or property. This page gives the math, the YAML, and guidance on choosing thresholds.

Notation: let $p$ be the observed distribution (normalised shot counts over bitstrings) and $q$ the expected distribution. The domain $\mathcal{D}$ is the union of their supports; bitstring keys are whitespace-stripped before comparison.

`distribution_tvd`: Total Variation Distance

$$ \mathrm{TVD}(p, q) = \tfrac{1}{2} \sum_{x \in \mathcal{D}} |p(x) - q(x)| \in [0, 1] $$

Passes when $\mathrm{TVD} \le$ max_distance. The recommended default oracle: interpretable (it's the maximum probability disagreement on any event), symmetric, and not sensitive to shot count.

- type: distribution_tvd
  expected: { "00": 0.5, "11": 0.5 }
  max_distance: 0.03

Typical thresholds: 0.01–0.03 on a noiseless simulator, 0.05–0.15 on noisy hardware (the measured ibm_fez Bell TVD was 0.1284 at 4096 shots; see the hardware baseline).

`hellinger_fidelity`: Classical Fidelity

With the Bhattacharyya coefficient $\mathrm{BC}(p,q) = \sum_x \sqrt{p(x),q(x)}$,

$$ F(p, q) = \mathrm{BC}(p, q)^2 \in [0, 1] $$

Passes when $F \ge$ min_fidelity. Identical to qiskit.quantum_info.hellinger_fidelity. Fidelity emphasises overlap and is a common single-number "closeness" score to track across commits.

- type: hellinger_fidelity
  expected: { "00": 0.5, "11": 0.5 }
  min_fidelity: 0.99

`chi_square`: Pearson Goodness-of-Fit

With $N$ shots, expected counts $E_x = N q(x)$ and observed $O_x = N p(x)$:

$$ \chi^2 = \sum_{x \in \mathcal{D}} \frac{(O_x - E_x)^2}{E_x}, \qquad \mathrm{dof} = |\mathcal{D}| - 1 $$

The p-value is the chi-square survival function $Q(\mathrm{dof}/2,\ \chi^2/2)$, computed from the regularised incomplete gamma function (no SciPy). Passes when p-value $\ge$ significance, i.e. we fail to reject the hypothesis that the counts came from $q$.

- type: chi_square
  expected: { "00": 0.5, "11": 0.5 }
  significance: 0.01     # alpha

Notes & caveats:

This is the formal hypothesis test used in the quantum-testing literature as an oracle. A smaller significance (e.g. 0.01) makes the gate more permissive (rejects only strong evidence of mismatch), reducing flakiness.
Outcomes expected with probability 0 but observed (leakage) push the statistic up and the p-value down, so the test correctly fails. (Internally, expected counts are floored at a tiny epsilon to keep the statistic finite.)
Classical validity expects $E_x \gtrsim 5$ per category; with many categories and few shots, prefer distribution_tvd.

`state_probability`: Marginal Outcome Probability

Bounds the measured probability of a single basis state $s$: $p(s)$.

# window form
- type: state_probability
  state: "11"
  min: 0.99
# target form
- type: state_probability
  state: "00"
  equals: 0.5
  tolerance: 0.02

Passes when all provided constraints hold: min $\le p(s)$, $p(s) \le$ max, and/or $|p(s) - \text{equals}| \le$ tolerance. At least one of min/max/equals is required. Ideal for amplitude-amplification checks (e.g. Grover's marked state).

`allowed_states`: Structural / Leakage Oracle

Bounds the probability mass measured outside an allowed support set:

$$ \text{leakage} = \sum_{x \notin S} p(x) $$

- type: allowed_states
  states: ["000", "111"]
  max_leakage: 0.005

Passes when $\text{leakage} \le$ max_leakage. Encodes structural truths independent of exact probabilities: a perfect GHZ state only occupies the all-zero and all-one corners; anything else is error/leakage.

Choosing oracles

Goal	Use
General "does the distribution match?"	`distribution_tvd` (default) + `chi_square`
Track closeness as one number over time	`hellinger_fidelity`
Algorithm produces a specific answer	`state_probability`
Forbid impossible outcomes / bound error	`allowed_states`

A robust suite combines a distance oracle, a hypothesis test, and a structural oracle, as the Bell example does.

Setting thresholds (statistical reality)

Sampling noise scales like $1/\sqrt{N}$. Per-bitstring standard error is roughly $\sqrt{p(1-p)/N}$; at $N=8192$ and $p=0.5$ that's $\approx 0.0055$. Set distance thresholds a few standard errors above expected noise, and prefer fixed seeds on simulators to make pre-merge gates deterministic. Reserve looser thresholds for real hardware, where device error, not sampling, dominates.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assertion Catalog (Statistical Oracles)

`distribution_tvd`: Total Variation Distance

`hellinger_fidelity`: Classical Fidelity

`chi_square`: Pearson Goodness-of-Fit

`state_probability`: Marginal Outcome Probability

`allowed_states`: Structural / Leakage Oracle

Choosing oracles

Setting thresholds (statistical reality)

FilesExpand file tree

assertions.md

Latest commit

History

assertions.md

File metadata and controls

Assertion Catalog (Statistical Oracles)

distribution_tvd: Total Variation Distance

hellinger_fidelity: Classical Fidelity

chi_square: Pearson Goodness-of-Fit

state_probability: Marginal Outcome Probability

allowed_states: Structural / Leakage Oracle

Choosing oracles

Setting thresholds (statistical reality)

`distribution_tvd`: Total Variation Distance

`hellinger_fidelity`: Classical Fidelity

`chi_square`: Pearson Goodness-of-Fit

`state_probability`: Marginal Outcome Probability

`allowed_states`: Structural / Leakage Oracle