Is your feature request related to a problem? Please describe.
I would like to be able to implement a GPU version of Spark's approx_count_distinct function, which uses the HyperLogLog++ cardinality estimation algorithm.
cuDF does not appear to provide any features today that would allow me to do this.
Describe the solution you'd like
I would like cuDF to implement this capability and expose an API that is likely similar to approx_percentile in that there would be methods both for computing and merging the underlying data structure, whether that is based on HyperLogLog++ or some other algorithm.
Describe alternatives you've considered
None
Additional context
None
Is your feature request related to a problem? Please describe.
I would like to be able to implement a GPU version of Spark's
approx_count_distinctfunction, which uses the HyperLogLog++ cardinality estimation algorithm.cuDF does not appear to provide any features today that would allow me to do this.
Describe the solution you'd like
I would like cuDF to implement this capability and expose an API that is likely similar to
approx_percentilein that there would be methods both for computing and merging the underlying data structure, whether that is based on HyperLogLog++ or some other algorithm.Describe alternatives you've considered
None
Additional context
None