Could duplicate-aware execution reduce embedding generation costs? #128818

likeslines-maker · 2026-05-31T13:47:18Z

likeslines-maker
May 31, 2026

While working with large RAG datasets, I noticed that many indexing pipelines repeatedly process logically identical content or multiple updates of the same entity.

In one benchmark, a significant portion of CPU time was spent recomputing results that had effectively already been calculated before. Once processing switched to a Last-Write-Wins model combined with caching and duplicate coalescing, the number of expensive operations dropped dramatically.

This made me wonder whether similar optimizations could be useful in Semantic Kernel scenarios such as:

embedding generation;
document enrichment;
AI transformations;
large-scale RAG indexing;
knowledge base maintenance.

To explore the idea, I built a library implementing adaptive duplicate-aware execution:

https://github.com/likeslines-maker/Principium.Parallel

The most interesting observation was that in workloads with very high duplication rates, eliminating redundant work had a much larger impact than increasing parallelism alone.

Have others encountered similar bottlenecks in production Semantic Kernel workloads?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could duplicate-aware execution reduce embedding generation costs? #128818

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Could duplicate-aware execution reduce embedding generation costs? #128818

Uh oh!

likeslines-maker May 31, 2026

Replies: 0 comments

likeslines-maker
May 31, 2026