@@ -615,6 +615,22 @@ <h5 id="voyageai-rate-limit-isolation">Rate limit isolation</h5>
615615that affect search queries. By using < strong > separate API keys</ strong > for feed and search embedders,
616616you ensure that feeding bursts don't negatively impact search.</ p >
617617
618+ < h5 id ="voyageai-document-processing-concurrency "> Increase feed concurrency</ h5 >
619+ < p > When using the VoyageAI embedder, container feed throughput is primarily limited by VoyageAI API latency
620+ combined with the document processing thread pool size, not by CPU. Each document being fed blocks a thread
621+ while waiting for the VoyageAI API response. To improve throughput, you likely have to increase the
622+ < a href ="../reference/applications/services/docproc.html#threadpool "> document processing thread pool size</ a > ,
623+ assuming the content cluster is not the bottleneck.</ p >
624+
625+ < p > For example, consider a container cluster with 2 nodes, each with 8 vCPUs. With the default document processing
626+ thread pool size of 1 thread per vCPU, you have 16 total threads. If the average VoyageAI API latency is 200ms,
627+ the maximum throughput is approximately 16 / 0.2 = 80 documents/second.
628+ See < a href ="../performance/container-tuning.html#docproc "> container tuning</ a > for more on container tuning.</ p >
629+
630+ < p > Note that the effective throughput can never exceed the rate limit of your VoyageAI API key.
631+ Use the < a href ="https://docs.vespa.ai/en/reference/operations/metrics/container.html "> embedder metrics</ a >
632+ to determine embedder latency and throughput.</ p >
633+
618634< h2 id ="embedder-performance "> Embedder performance</ h2 >
619635
620636< p > Embedding inference can be resource-intensive for larger embedding models. Factors that impact performance:</ p >
0 commit comments