Skip to content

Building a Multi-Language LLM-based, WhatsApp Chat Service using Text or Audio in various formats like OPUS OGG, WAV, MP3 etc.

Notifications You must be signed in to change notification settings

SiddyHub/AzureAIChatApp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AzureAIChatApp

Build a multi-language chat service using Text or Audio (OPUS OGG, WAV, MP3 etc.) with Azure services.

image

image

Multi-Language LLM-based WhatsApp Chat Service

This sample demonstrates how to build a multi-language, LLM-based chat service (like WhatsApp etc.) using text or audio in various formats such as OPUS OGG, WAV, and MP3. It leverages Azure services like Real Time Speech to Text and Azure OpenAI with GPT 4o LLM Model with Azure AI Foundary. Azure AI Foundry provides everything you need to customize, host, run, and manage AI-driven applications built in Visual Studio, with APIs for all your needs.

Note: Currently the candidate languages added for Audio are Tamil, Kannada, Gujarati, Hindi, Bengali and English. If you wish to add any other languages, we can make changes to the code accordingly. Refer this section.

Local Audio File Support

This Chat App accepts audio files from the local file system in any format. If you need to fetch audio from a specific URL and stream it to the Speech SDK, please check out my other solution AzureAIChatApp-AudioStream.

Using Semantic Kernel

In this solution, we use Semantic Kernel with an OpenAI model using C#. Semantic Kernel provides integration with C# and other .NET languages, offering libraries or packages that are idiomatic to C# developers.

Handling Audio Streams

For handling audio streams in various formats, Microsoft documentation recommends using GStreamer. The Speech SDK can use GStreamer to handle compressed audio. Note that for licensing reasons, GStreamer binaries aren't compiled and linked with the Speech SDK. You need to install some dependencies and plug-ins. Before running the code, make sure you Configure GStreamer.

Pre-requisites to Run the Application

Setting Up Database and Project

Setting Up Northwind Database

We would be using a sample relational database. For this we would be Setting up SQL Server and the Northwind database. If you have Windows, then you can use a free version that runs standalone, known as SQL Server Developer Edition. You can also use the Express edition or the free SQL Server LocalDB edition that can be installed with Visual Studio.

To install SQL Server locally on Windows, please visit install link, and for Installation Guidance follow guidance link.

Once SQL Server is Installed, we would be Creating the Northwind sample database for SQL Server using following steps:

  1. When you Download the Sample or Clone the Repo, you will find a Scripts folder, which contains Northwind4SQLServer.sql file: [/Scripts/Northwind4SQLServer.sql]. Save this file locally to any file directory or Copy the Northwind4SQLServer.sql contents.
  2. Start SQL Server Management Studio.
  3. In the Connect to Server dialog, for Server name, enter . (a dot) meaning the local computer name, and then click Connect.
  4. Navigate to File | Open | File....
  5. Browse to select the Northwind4SQLServer.sql file and then click Open. OR Select New Query and Paste the copied Northwind4SQLServer.sql contents.
  6. In the toolbar, click Execute, and note the the Command(s) completed successfully message.
  7. In Object Explorer, expand the Northwind database, and then expand Tables.
  8. Right-click Products, click Select Top 1000 Rows, and note the returned results.

Configure GStreamer

Rather than using various C# Nuget packages for handling multiple Audio Formats we can configure GStreamer which works with the Speech SDK.

Follow this Microsoft link to configure GStreamer.

Note: If GStreamer is not configured the solution will not work.

Creating and Deploying an Azure OpenAI Service Resource

Prerequisites

  • An active Azure subscription.
  • Necessary permissions to create Azure OpenAI resources and deploy models.

Creating a Resource

  1. Sign in to the Azure portal.
  2. Create a new resource:
    • Select "Create a resource" and search for "Azure OpenAI".
    • Click "Create" and fill in the required fields:
      • Subscription: Choose your Azure subscription.
      • Resource Group: Select or create a new resource group.
      • Region: Choose the location for your resource.
      • Name: Provide a descriptive name for your resource.
      • Pricing Tier: Select the appropriate pricing tier.
  3. Configure Network Security:
    • Choose between allowing all networks, specific networks, or disabling network access.
  4. Add Tags (optional):
    • Add tags for better resource management.
  5. Review and Create:
    • Review your settings and click "Create".

Deploying a GPT-4 Model

  1. Sign in to the Azure AI Foundry portal.
  2. Select your subscription and Azure OpenAI resource.
  3. Deploy the model:
    • Choose "Use resource" to deploy your GPT-4 model.

Summary

This guide helps you create and deploy an Azure OpenAI Service resource, ensuring you can effectively manage and utilize GPT-4 models for your AI applications.

For more detailed instructions, visit the official documentation.

Editing Candidate Languages

Currently the candidate languages added for Audio are Tamil, Kannada, Gujarati, Hindi, Bengali and English.

var autoDetectSourceLanguageConfig =
    AutoDetectSourceLanguageConfig.FromLanguages(
        new string[] { "hi-IN", "bn-IN", "gu-IN", "kn-IN", "ta-IN" });

If you wish to add any other languages, we can add the supported ISO code, for example if fr-FR (French) language recognition is required, we can add it like below:

var autoDetectSourceLanguageConfig =
    AutoDetectSourceLanguageConfig.FromLanguages(
        new string[] { "hi-IN", "bn-IN", "gu-IN", "kn-IN", "ta-IN", "fr-FR" });

Note: You can include up to 10 languages for Continuous Language Identification.

Also, the Speech service returns one of the candidate languages provided even if those languages weren't in the audio. For example, if fr-FR (French) and en-US (English) are provided as candidates, but German is spoken, the service returns either fr-FR or en-US.

For more information, see Supported Languages.

Running the Application

To download and run the sample, follow these steps:

  1. Download and unzip the sample / Clone the Repo.
  2. In Visual Studio (2022):
    1. On the menu bar, choose File > Open > Project/Solution.

    2. Navigate to the folder that holds the unzipped sample code or cloned repo, and open the solution (.sln) file.

    3. Enter Azure OpenAI Endpoint and Key in launchSettings.json file under "Properties" folder. OR If you are using Visual Studio, then right click on AzureAIChatApp Project > Click Properties, and in the Debug section, click Open debug launch profiles UI, and enter an environment variable, as shown below.

    4. Make the following changes in code.

      1. In Program.cs file Line No. 66 (assuming no new code changes are done), add Azure Speech Service Subscription Key and Region.
      2. In NorthwindContext.cs file under Northwind.DataContext project, go to OnConfiguring method and add appropriate value for builder.DataSource value based on the SQL Server Set Up.
      builder.DataSource = "(localdb)\\MSSQLLocalDB"
      1. In Program.cs file Line No. 62 (assuming no new code changes are done), add the > audioFilePath where the audio files are saved.
      audioFilePath = "your-local-audio-filepath";

      Note: If there are no audio files, you can create using Sound Recorder App on Windows PC. This will save WAV files by default to Documents folder. Then you can make following code changes.

      audioFilePath = "C:\\Users\\Documents\\Sound Recordings";
    5. Choose the F5 key to run with debugging, or Ctrl+F5 keys to run the project without debugging. OR If you are using Visual Studio, press the Green Play Button.

  3. From the command line:
    1. Navigate to the folder that holds the unzipped sample code.
    2. At the command line, type dotnet run.

Troubleshooting

  1. If you get System.ApplicationException: 'Exception with an error code: 0x29 (SPXERR_GSTREAMER_NOT_FOUND_ERROR)' error when running code, it means GStreamer is not configured on the machine. Checkout this section for configuring GStreamer.

  2. When running the code if you get System.UriFormatException Message=Invalid URI: The format of the URI could not be determined., then it means the Azure OpenAI Key or Endpoint was not configured. Please refer this section under point No. 3.

  3. If you get ErrorDetails=Connection failed (no connection to the remote host). Internal error: 1. Error details: Failed with error: WS_OPEN_ERROR_UNDERLYING_IO_OPEN_FAILED error while debugging Audio samples, then it means Azure Speech Services was not configured. If configured make sure to add Subscription Key and Region in code. Please refer this section under point No. 4 and sub-point 1.

About

Building a Multi-Language LLM-based, WhatsApp Chat Service using Text or Audio in various formats like OPUS OGG, WAV, MP3 etc.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published