Skip to content

Performance Issues with Azure ML Endpoint #12

@ChiragM-Hexaware

Description

@ChiragM-Hexaware

Background:
We have successfully deployed a pipeline that incorporates an "Web Service Input", "Enter Data Manually", "Execute Python Script" & "Web Service output" modules, which interacts with the GPT-3.5 Turbo hosted on Azure OpenAI. The script within python module takes prompts from a script bundle and inputs from both the "enter_data_manually" and the "web service input" ports(Screenshot is attached below)

Issue:
The primary concern is the significant delay in response time from the deployed endpoint. Despite efforts to optimize the deployment, including scaling the AksCompute from Standard_A2_v2 to higher specifications such as Standard_A4_v2 and Standard_A8_v2, the response time remains consistently high at approximately 8-9 seconds minimum (Postman screenshot is attached)

Steps Taken:

  1. Increased AksCompute specifications (Standard_A4_v2 and Standard_A8_v2).
  2. Tested deployment in the same location as the Azure OpenAI subscription.
  3. Checked Application insights
    Azure ML Designer SS
    Response Time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions