You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While working with large RAG datasets, I noticed that many indexing pipelines repeatedly process logically identical content or multiple updates of the same entity.
In one benchmark, a significant portion of CPU time was spent recomputing results that had effectively already been calculated before. Once processing switched to a Last-Write-Wins model combined with caching and duplicate coalescing, the number of expensive operations dropped dramatically.
This made me wonder whether similar optimizations could be useful in Semantic Kernel scenarios such as:
embedding generation;
document enrichment;
AI transformations;
large-scale RAG indexing;
knowledge base maintenance.
To explore the idea, I built a library implementing adaptive duplicate-aware execution:
The most interesting observation was that in workloads with very high duplication rates, eliminating redundant work had a much larger impact than increasing parallelism alone.
Have others encountered similar bottlenecks in production Semantic Kernel workloads?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
While working with large RAG datasets, I noticed that many indexing pipelines repeatedly process logically identical content or multiple updates of the same entity.
In one benchmark, a significant portion of CPU time was spent recomputing results that had effectively already been calculated before. Once processing switched to a Last-Write-Wins model combined with caching and duplicate coalescing, the number of expensive operations dropped dramatically.
This made me wonder whether similar optimizations could be useful in Semantic Kernel scenarios such as:
To explore the idea, I built a library implementing adaptive duplicate-aware execution:
https://github.com/likeslines-maker/Principium.Parallel
The most interesting observation was that in workloads with very high duplication rates, eliminating redundant work had a much larger impact than increasing parallelism alone.
Have others encountered similar bottlenecks in production Semantic Kernel workloads?
Beta Was this translation helpful? Give feedback.
All reactions