I want to propose a new privacy mechanism to IPA that:
- Serves to help solve the optimization use-case
- Does so under the existing differential privacy constraints of IPA as I understand them
This mechanism is inspired by the following research papers:
This research operates in the “label DP” setting, where model features are publicly known, but labels are not. This matches the setting of IPA well, because the report collector has full information about the raw impression event (and thus the corresponding feature vector), but the label (e.g. whether the impression led to a conversion) is protected with differential privacy. While these techniques are in the “local” differential privacy regime, they (somewhat surprisingly) perform close to the state of the art in private model training depending on the task.
Here is an outline of how we could implement one of the algorithms described here, RROnBins in the IPA setting:
- For every per-source breakdown key, pass along a list of possible output bins. For example, a bucketized range of values like
{[0, 10], [11, 100], [101+]}
- After aggregation*, rather than applying a fixed noise distribution like Gaussian to the sum, perform k-ary Randomized Response on the specified k output bins for that breakdown key. That is, with a probability
p = k/(k -1 + e^epsilon), pick a bin at random, otherwise pick the correct bin the aggregate falls in. This mechanism satisfies epsilon differential privacy.
- Note: the mechanism can also be extended to do randomized response over a restricted set of trigger-side breakdown keys if those become supported by IPA.
*Note: for practical purposes, we would likely need to support a unique breakdown key per source to take advantage of this research, i.e “aggregate” over only single sources. While IPA currently only mentions “aggregate” queries, as far as I can tell there are no restrictions in the existing design to aggregate only over single events (i.e. a unique breakdown key per source), as long as the output is protected by differential privacy. The mechanism described above offers the same protection.
For an overview of how we’re thinking about these capabilities in ARA, see the flexible event-level explainer we recently published for ARA.
I want to propose a new privacy mechanism to IPA that:
This mechanism is inspired by the following research papers:
This research operates in the “label DP” setting, where model features are publicly known, but labels are not. This matches the setting of IPA well, because the report collector has full information about the raw impression event (and thus the corresponding feature vector), but the label (e.g. whether the impression led to a conversion) is protected with differential privacy. While these techniques are in the “local” differential privacy regime, they (somewhat surprisingly) perform close to the state of the art in private model training depending on the task.
Here is an outline of how we could implement one of the algorithms described here,
RROnBinsin the IPA setting:{[0, 10], [11, 100], [101+]}p = k/(k -1 + e^epsilon), pick a bin at random, otherwise pick the correct bin the aggregate falls in. This mechanism satisfies epsilon differential privacy.*Note: for practical purposes, we would likely need to support a unique breakdown key per source to take advantage of this research, i.e “aggregate” over only single sources. While IPA currently only mentions “aggregate” queries, as far as I can tell there are no restrictions in the existing design to aggregate only over single events (i.e. a unique breakdown key per source), as long as the output is protected by differential privacy. The mechanism described above offers the same protection.
For an overview of how we’re thinking about these capabilities in ARA, see the flexible event-level explainer we recently published for ARA.