Skip to content

Fix INDEPENDENT mechanism crash on duplicate cliques#4

Open
gghatano wants to merge 1 commit into
google:mainfrom
gghatano:fix-independent-dedup-cliques
Open

Fix INDEPENDENT mechanism crash on duplicate cliques#4
gghatano wants to merge 1 commit into
google:mainfrom
gghatano:fix-independent-dedup-cliques

Conversation

@gghatano

@gghatano gghatano commented Jun 3, 2026

Copy link
Copy Markdown

Fixes the ValueError: Cliques must be unique. raised when running
dpsynth.generate(..., discrete_config=IndependentConfig()) (see issue:
"IndependentConfig synthesis raises 'Cliques must be unique.'").

Cause

independent.run_mechanism builds measurements from initial_measurements
(the one-way marginals that data_generation_v2.generate always provides) and
then appends a freshly measured one-way marginal for every attribute, so the
list contains each one-way clique twice. When initial_potentials is not
None (e.g. the empty CliqueVector from constraints.get_initial_parameters
in the no-constraints case), potentials.expand([m.clique for m in measurements]) is called with duplicate cliques, which current mbi rejects.

Change

De-duplicate the clique list (order-preserving) before expand:

unique_cliques = list(dict.fromkeys(m.clique for m in measurements))
potentials = potentials.expand(unique_cliques)

measurements itself is unchanged, so the marginals fed to mirror_descent
and the privacy accounting are unaffected — only the clique set used to expand
the (zero) potentials is de-duplicated.

Verification

The repro in the issue (a 2-column categorical frame with
IndependentConfig()) now returns a synthetic DataFrame instead of raising.
MST/AIM are unaffected.

Fixes #2

@google-cla

google-cla Bot commented Jun 3, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

`independent.run_mechanism` builds `measurements` from `initial_measurements`
(the one-way marginals that `data_generation_v2.generate` always passes in)
and then appends a freshly measured one-way marginal for every attribute. When
`initial_potentials` is not None (e.g. the empty CliqueVector returned by
`constraints.get_initial_parameters` for the no-constraints case), the code
calls `potentials.expand([m.clique for m in measurements])` with a clique list
that now contains duplicate one-way cliques.

With current `mbi`, `CliqueVector.expand` requires unique cliques and raises:

    ValueError: Cliques must be unique.

so `dpsynth.generate(..., discrete_config=IndependentConfig())` fails. De-dup
the clique list (order-preserving) before expanding. Measurements themselves are
left unchanged, so the estimation/accounting are unaffected.
@gghatano gghatano force-pushed the fix-independent-dedup-cliques branch from fc87d86 to 1755e90 Compare June 4, 2026 03:55
potentials = initial_potentials
if potentials is not None:
potentials = potentials.expand([m.clique for m in measurements])
# `measurements` can contain the same clique more than once (the one-way

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the bug fix, but can you remove this inline comment or reduce it to 1 line?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add test coverage for this case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IndependentConfig synthesis raises "Cliques must be unique."

3 participants