feat: Support overriding max concurrent Flux streams#121
feat: Support overriding max concurrent Flux streams#121bnogas wants to merge 3 commits intodeepgram:mainfrom
Conversation
|
@bnogas Deepgram doesn't officially support MIG partitions, and it looks like the underlying issue here is that your GPU isn't being detected, so Deepgram is falling back to our low CPU default of 5 streams. Are you getting better performance out of raising the If you check your Engine logs, on startup are you seeing a log like |
|
@jkroll-deepgram It uses GPU I believe there is a difference in API call to get gpu_memory_size when MIG is enabled
Yes, we have stress tested up to 100 streams on a single engine with 3/7 MIG partition of H100. The other PR adds support for MIG partitions |
Proposed changes
Summary
While running Flux on a 3/7 MIG partition of our H200 GPUs, we observed that execution was limited to 5 streams per engine instance. To address this, I added an override to the default configuration to allow higher concurrency.
Notes
It’s currently unclear how Flux determines available memory. This limitation may be related to differences in the MIG-specific API or another underlying issue, and may require further investigation.
Types of changes
What types of changes does your code introduce to the Deepgram self-hosted resources?
Put an
xin the boxes that applyChecklist
Put an
xin the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.Further comments