Skip to content

IndependentConfig synthesis raises "Cliques must be unique." #2

@gghatano

Description

@gghatano

Summary

Calling dpsynth.generate(...) with discrete_config=IndependentConfig() fails
with a ValueError from mbi.CliqueVector.expand. MST and AIM work on the same
input; only INDEPENDENT is affected.

Environment

  • Linux, Python 3.12, current mbi (git+https://github.com/ryan112358/mbi.git)

Reproduction

import pandas as pd
import dpsynth
from dpsynth import domain, discrete_mechanisms as dm

df = pd.DataFrame({
    "a": ["x", "y", "x", "y"] * 50,
    "b": [0, 1, 1, 0] * 50,
})
domains = {
    "a": domain.CategoricalAttribute(possible_values=["x", "y"]),
    "b": domain.CategoricalAttribute(possible_values=[0, 1]),
}
dpsynth.generate(df, domains, epsilon=1.0, delta=1e-5,
                 discrete_config=dm.IndependentConfig())
ValueError: Cliques must be unique.
  File ".../dpsynth/discrete_mechanisms/independent.py", line 73, in run_mechanism
    potentials = potentials.expand([m.clique for m in measurements])
  File ".../mbi/clique_vector.py", line 55, in __attrs_post_init__
    raise ValueError("Cliques must be unique.")

Cause

In independent.run_mechanism, measurements starts from
initial_measurements — the one-way marginals that
data_generation_v2.generate always passes in — and then the loop appends a
freshly measured one-way marginal for every attribute. So measurements
contains each one-way clique twice.

When initial_potentials is not None (e.g. the empty CliqueVector returned by
constraints.get_initial_parameters even when there are no cross-attribute
constraints), line 73 calls
potentials.expand([m.clique for m in measurements]) with that duplicated
clique list. Current mbi requires unique cliques in expand, so it raises.

Suggested fix

De-duplicate the clique list (order-preserving) before calling expand. The
measurements passed to mirror_descent are left unchanged, so estimation and
privacy accounting are unaffected. PR attached.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions