Summary
Calling dpsynth.generate(...) with discrete_config=IndependentConfig() fails
with a ValueError from mbi.CliqueVector.expand. MST and AIM work on the same
input; only INDEPENDENT is affected.
Environment
- Linux, Python 3.12, current
mbi (git+https://github.com/ryan112358/mbi.git)
Reproduction
import pandas as pd
import dpsynth
from dpsynth import domain, discrete_mechanisms as dm
df = pd.DataFrame({
"a": ["x", "y", "x", "y"] * 50,
"b": [0, 1, 1, 0] * 50,
})
domains = {
"a": domain.CategoricalAttribute(possible_values=["x", "y"]),
"b": domain.CategoricalAttribute(possible_values=[0, 1]),
}
dpsynth.generate(df, domains, epsilon=1.0, delta=1e-5,
discrete_config=dm.IndependentConfig())
ValueError: Cliques must be unique.
File ".../dpsynth/discrete_mechanisms/independent.py", line 73, in run_mechanism
potentials = potentials.expand([m.clique for m in measurements])
File ".../mbi/clique_vector.py", line 55, in __attrs_post_init__
raise ValueError("Cliques must be unique.")
Cause
In independent.run_mechanism, measurements starts from
initial_measurements — the one-way marginals that
data_generation_v2.generate always passes in — and then the loop appends a
freshly measured one-way marginal for every attribute. So measurements
contains each one-way clique twice.
When initial_potentials is not None (e.g. the empty CliqueVector returned by
constraints.get_initial_parameters even when there are no cross-attribute
constraints), line 73 calls
potentials.expand([m.clique for m in measurements]) with that duplicated
clique list. Current mbi requires unique cliques in expand, so it raises.
Suggested fix
De-duplicate the clique list (order-preserving) before calling expand. The
measurements passed to mirror_descent are left unchanged, so estimation and
privacy accounting are unaffected. PR attached.
Summary
Calling
dpsynth.generate(...)withdiscrete_config=IndependentConfig()failswith a
ValueErrorfrommbi.CliqueVector.expand. MST and AIM work on the sameinput; only INDEPENDENT is affected.
Environment
mbi(git+https://github.com/ryan112358/mbi.git)Reproduction
Cause
In
independent.run_mechanism,measurementsstarts frominitial_measurements— the one-way marginals thatdata_generation_v2.generatealways passes in — and then the loop appends afreshly measured one-way marginal for every attribute. So
measurementscontains each one-way clique twice.
When
initial_potentials is not None(e.g. the emptyCliqueVectorreturned byconstraints.get_initial_parameterseven when there are no cross-attributeconstraints), line 73 calls
potentials.expand([m.clique for m in measurements])with that duplicatedclique list. Current
mbirequires unique cliques inexpand, so it raises.Suggested fix
De-duplicate the clique list (order-preserving) before calling
expand. Themeasurements passed to
mirror_descentare left unchanged, so estimation andprivacy accounting are unaffected. PR attached.