-
Notifications
You must be signed in to change notification settings - Fork 49
Description
Currently, during clustering on the CPU, the ClusterGenerator compacts the latent space matrix whenever a cluster is yielded. This is done such that the bottleneck, computing cosine distances, is sped up as the matrix reduces in size.
When clustering on GPU, this compaction is much slower since it requires a GPU -> CPU -> GPU roundtrip, whereas the cosine distance computation is faster. As such, the GPU clustering instead makes use of a kept_mask, allowing the clusterer to ignore already emitted clusters. Then, the matrix is only rarely compared.
Recent benchmarking (as part of #423) shows that on CPU compacting takes about twice as long as a cosine distance calculation, suggesting that compaction should probably be rarer than on every cluster.
Instead, we may want to compact every Nth cluster, where we set different values of N for CPU and GPU. Examples could be N=10 for CPU and N=100 for GPU.