Performance Issues with Azure ML Endpoint

Background:
We have successfully deployed a pipeline that incorporates an "Web Service Input", "Enter Data Manually", "Execute Python Script" & "Web Service output" modules, which interacts with the GPT-3.5 Turbo hosted on Azure OpenAI. The script within python module takes prompts from a script bundle and inputs from both the "enter_data_manually" and the "web service input" ports(Screenshot is attached below)

Issue:
The primary concern is the significant delay in response time from the deployed endpoint. Despite efforts to optimize the deployment, including scaling the AksCompute from Standard_A2_v2 to higher specifications such as Standard_A4_v2 and Standard_A8_v2, the response time remains consistently high at approximately 8-9 seconds minimum (Postman screenshot is attached)

Steps Taken:
1. Increased AksCompute specifications (Standard_A4_v2 and Standard_A8_v2).
2. Tested deployment in the same location as the Azure OpenAI subscription.
3. Checked Application insights
![Azure ML Designer SS](https://github.com/Azure/MachineLearningDesigner/assets/102511728/ee8f7f8c-1445-4ecd-8ede-3bd3baed13c3)
![Response Time](https://github.com/Azure/MachineLearningDesigner/assets/102511728/b02afa6c-5bc9-4c70-99f0-e0aced205e8b)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Issues with Azure ML Endpoint #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance Issues with Azure ML Endpoint #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions