chore: document best practices for improving feed throughput

bjorncs · bjorncs · commit 2e3a4ac1e48e · 2026-01-29T09:38:34.000+01:00
diff --git a/en/rag/embedding.html b/en/rag/embedding.html
@@ -615,6 +615,22 @@ <h5 id="voyageai-rate-limit-isolation">Rate limit isolation</h5>
 that affect search queries. By using <strong>separate API keys</strong> for feed and search embedders,
 you ensure that feeding bursts don't negatively impact search.</p>
 
+<h5 id="voyageai-document-processing-concurrency">Increase feed concurrency</h5>
+<p>When using the VoyageAI embedder, container feed throughput is primarily limited by VoyageAI API latency
+    combined with the document processing thread pool size, not by CPU. Each document being fed blocks a thread
+    while waiting for the VoyageAI API response. To improve throughput, you likely have to increase the
+    <a href="../reference/applications/services/docproc.html#threadpool">document processing thread pool size</a>,
+    assuming the content cluster is not the bottleneck.</p>
+
+<p>For example, consider a container cluster with 2 nodes, each with 8 vCPUs. With the default document processing
+    thread pool size of 1 thread per vCPU, you have 16 total threads. If the average VoyageAI API latency is 200ms,
+    the maximum throughput is approximately 16 / 0.2 = 80 documents/second.
+    See <a href="../performance/container-tuning.html#docproc">container tuning</a> for more on container tuning.</p>
+
+<p>Note that the effective throughput can never exceed the rate limit of your VoyageAI API key.
+    Use the <a href="https://docs.vespa.ai/en/reference/operations/metrics/container.html">embedder metrics</a>
+    to determine embedder latency and throughput.</p>
+
 <h2 id="embedder-performance">Embedder performance</h2>
 
 <p>Embedding inference can be resource-intensive for larger embedding models. Factors that impact performance:</p>