Skip to content

Consider a randomized-response-like privacy mechanism #60

@csharrison

Description

@csharrison

I want to propose a new privacy mechanism to IPA that:

  • Serves to help solve the optimization use-case
  • Does so under the existing differential privacy constraints of IPA as I understand them

This mechanism is inspired by the following research papers:

This research operates in the “label DP” setting, where model features are publicly known, but labels are not. This matches the setting of IPA well, because the report collector has full information about the raw impression event (and thus the corresponding feature vector), but the label (e.g. whether the impression led to a conversion) is protected with differential privacy. While these techniques are in the “local” differential privacy regime, they (somewhat surprisingly) perform close to the state of the art in private model training depending on the task.

Here is an outline of how we could implement one of the algorithms described here, RROnBins in the IPA setting:

  • For every per-source breakdown key, pass along a list of possible output bins. For example, a bucketized range of values like {[0, 10], [11, 100], [101+]}
  • After aggregation*, rather than applying a fixed noise distribution like Gaussian to the sum, perform k-ary Randomized Response on the specified k output bins for that breakdown key. That is, with a probability p = k/(k -1 + e^epsilon), pick a bin at random, otherwise pick the correct bin the aggregate falls in. This mechanism satisfies epsilon differential privacy.
  • Note: the mechanism can also be extended to do randomized response over a restricted set of trigger-side breakdown keys if those become supported by IPA.

*Note: for practical purposes, we would likely need to support a unique breakdown key per source to take advantage of this research, i.e “aggregate” over only single sources. While IPA currently only mentions “aggregate” queries, as far as I can tell there are no restrictions in the existing design to aggregate only over single events (i.e. a unique breakdown key per source), as long as the output is protected by differential privacy. The mechanism described above offers the same protection.

For an overview of how we’re thinking about these capabilities in ARA, see the flexible event-level explainer we recently published for ARA.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions