For an aggregation service, maintaining the state necessary to ensure that sites cannot submit the same report for aggregation multiple times is one of the most important requirements. It's also one of the most expensive to implement. A simplistic approach, like stashing all reports in a database, does not scale particularly well.
I noted in #363 that the efficiency of managing anti-replay state is improved when there are more tasks. However, within a task, you can further improve anti-replay efficiency further by binding reports to a limited set of identifiers.
In report partition I sketched the DAP report extension. Unlike the value suggested in #363, it would be possible to combine reports with different "partition" values.
The point of the value is to help make the task of tracking anti-replay state more efficient for aggregators. However, a report extension is not sufficient on its own. You need two additional things:
- An API option so that sites can bind conversion reports to partition identifiers.
- Facilities at the point that the site makes queries so that the aggregators can take advantage of the partition attribute.
Both are necessary to make use of this productively. There are already ways to shard the anti-replay state for reports, based on the DAP report_id field. Any site-controlled value is only useful as a means of lightening the load on aggregators if the site helps, both by coordinating the value of the parameter and by using the value in the collection jobs (batches) it creates.
In DAP, both Leader and Helper maintain anti-replay state. So, this also requires that any interactions between site/Collector and the DAP Leader can also be passed by the Leader to the Helper.
Some ways that a "partition" value might be used when asking for aggregation:
- A site might commit to never ask about a given partition again. Then, anti-replay state for reports with that partition can be reduced to remembering the partition. Any report with that partition can be rejected far more easily. (Note: Aggregators would have to defend themselves against pointless use of this commitment for partitions that have no reports, which might increase the state they hold.)
- A site might restrict a request to a set of partitions. Any reports that don't match the identified partitions would be rejected cheaply, without consulting anti-replay state.
- More complex arrangements that combine report attributes, like sites committing to never asking for aggregation in a partition prior to a certain timestamp.
All of these require specific (and different) extensions to DAP. Both so that sites/Collectors can make commitments to the Leader and so that the Leader can pass those commitments on to Helpers. Related to this is the idea of a separate anti-replay commitment protocol layer in DAP.
For an aggregation service, maintaining the state necessary to ensure that sites cannot submit the same report for aggregation multiple times is one of the most important requirements. It's also one of the most expensive to implement. A simplistic approach, like stashing all reports in a database, does not scale particularly well.
I noted in #363 that the efficiency of managing anti-replay state is improved when there are more tasks. However, within a task, you can further improve anti-replay efficiency further by binding reports to a limited set of identifiers.
In report partition I sketched the DAP report extension. Unlike the value suggested in #363, it would be possible to combine reports with different "partition" values.
The point of the value is to help make the task of tracking anti-replay state more efficient for aggregators. However, a report extension is not sufficient on its own. You need two additional things:
Both are necessary to make use of this productively. There are already ways to shard the anti-replay state for reports, based on the DAP
report_idfield. Any site-controlled value is only useful as a means of lightening the load on aggregators if the site helps, both by coordinating the value of the parameter and by using the value in the collection jobs (batches) it creates.In DAP, both Leader and Helper maintain anti-replay state. So, this also requires that any interactions between site/Collector and the DAP Leader can also be passed by the Leader to the Helper.
Some ways that a "partition" value might be used when asking for aggregation:
All of these require specific (and different) extensions to DAP. Both so that sites/Collectors can make commitments to the Leader and so that the Leader can pass those commitments on to Helpers. Related to this is the idea of a separate anti-replay commitment protocol layer in DAP.