-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Background:
We have successfully deployed a pipeline that incorporates an "Web Service Input", "Enter Data Manually", "Execute Python Script" & "Web Service output" modules, which interacts with the GPT-3.5 Turbo hosted on Azure OpenAI. The script within python module takes prompts from a script bundle and inputs from both the "enter_data_manually" and the "web service input" ports(Screenshot is attached below)
Issue:
The primary concern is the significant delay in response time from the deployed endpoint. Despite efforts to optimize the deployment, including scaling the AksCompute from Standard_A2_v2 to higher specifications such as Standard_A4_v2 and Standard_A8_v2, the response time remains consistently high at approximately 8-9 seconds minimum (Postman screenshot is attached)
Steps Taken:

