Skip to content

Distance metrics compare 1-D marginals only: cross-world coupling (PN/PS) is invisible #7

Description

@fabio-rovai

A complete, tested branch implementing the proposal below is ready at https://github.com/fabio-rovai/causal-perception-implementation/tree/pn-ps-identification-bounds (15 passing tests, additive and opt-in). I can open it as a PR whenever you prefer.

distances.py compares marginals, so cross-world counterfactual coupling is invisible: add PN/PS + Fréchet bounds (opt-in)?

Hi, and thanks for open-sourcing this. I have been reading through the causal
perception implementation and I think I have spotted a subtle but important
identification gap, and I would like to check whether you would welcome a small
opt-in PR before I send one.

What I think is happening

distances.py (W2, KL, TV) takes two 1D sample arrays, and
perception.run_perception feeds it the per-individual outcome-probability
vectors as 1D marginals. The cross-world joint P(Y_0, Y_1) of the binary outcome
is never formed. On top of that, LinearANM.abduct explicitly does no noise
abduction for Y (the comment says the counterfactual probability is computed from
the classifier on counterfactual parents). So the cross-world coupling of the
binary outcome is not pinned down by anything in the pipeline, and any two SCMs
that share the two interventional marginals but differ in how they couple the two
worlds will look identical to W2/KL/TV.

A small witness

I ran a quick check with two binary potential-outcome models that share their
marginals exactly (R0 = 0.5, R1 = 0.7):

  • monotone coupling, p11 = P(Y_0=1, Y_1=1) = 0.50 -> PN = P(Y_0=0 | Y_1=1) = 0.286
  • independent outcomes, p11 = 0.35 -> PN = 0.500

compute_all_distances on the 1D marginals reads ~0 for W2, KL and TV in both
cases (the marginals are identical), but the probability of necessity separates
the two models by about 0.214. The marginal distances are blind to exactly that.

This is not a bug in the distances, it is an identification fact: with only the
two marginals and no abducted outcome noise, P(Y_0, Y_1) is only Fréchet-bounded.
In the fair-credit framing this matters, because a point counterfactual on a
protected attribute quietly hides an interval.

Proposal

Would you welcome a small, additive, opt-in PR that:

  • reports PN and PS alongside their sharp Fréchet identification bounds from the
    two marginals (with the assumption stated explicitly in the docstrings),
  • names the two endpoint couplings (monotone and independent) as point estimates
    inside the bounds, and
  • adds a run_* script and tests, including the witness above,

with zero change to any existing module, output or default? I would keep it
entirely separate from the current distance pipeline so nothing you rely on
moves.

Happy to sign the CLA. If this is useful I will open the PR; if you would rather
shape it differently first, I am glad to discuss here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions