Skip to content

Commit 2e3a4ac

Browse files
committed
chore: document best practices for improving feed throughput
1 parent 7c3a569 commit 2e3a4ac

1 file changed

Lines changed: 16 additions & 0 deletions

File tree

en/rag/embedding.html

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -615,6 +615,22 @@ <h5 id="voyageai-rate-limit-isolation">Rate limit isolation</h5>
615615
that affect search queries. By using <strong>separate API keys</strong> for feed and search embedders,
616616
you ensure that feeding bursts don't negatively impact search.</p>
617617

618+
<h5 id="voyageai-document-processing-concurrency">Increase feed concurrency</h5>
619+
<p>When using the VoyageAI embedder, container feed throughput is primarily limited by VoyageAI API latency
620+
combined with the document processing thread pool size, not by CPU. Each document being fed blocks a thread
621+
while waiting for the VoyageAI API response. To improve throughput, you likely have to increase the
622+
<a href="../reference/applications/services/docproc.html#threadpool">document processing thread pool size</a>,
623+
assuming the content cluster is not the bottleneck.</p>
624+
625+
<p>For example, consider a container cluster with 2 nodes, each with 8 vCPUs. With the default document processing
626+
thread pool size of 1 thread per vCPU, you have 16 total threads. If the average VoyageAI API latency is 200ms,
627+
the maximum throughput is approximately 16 / 0.2 = 80 documents/second.
628+
See <a href="../performance/container-tuning.html#docproc">container tuning</a> for more on container tuning.</p>
629+
630+
<p>Note that the effective throughput can never exceed the rate limit of your VoyageAI API key.
631+
Use the <a href="https://docs.vespa.ai/en/reference/operations/metrics/container.html">embedder metrics</a>
632+
to determine embedder latency and throughput.</p>
633+
618634
<h2 id="embedder-performance">Embedder performance</h2>
619635

620636
<p>Embedding inference can be resource-intensive for larger embedding models. Factors that impact performance:</p>

0 commit comments

Comments
 (0)