Skip to content

Latest commit

 

History

History
254 lines (191 loc) · 7.49 KB

File metadata and controls

254 lines (191 loc) · 7.49 KB

LLM Tools

A comprehensive toolkit for Large Language Models (LLM) and embedding models, supporting chat, embeddings, and reranking functionality with flexible configuration options and multi-engine support. Whether handling synchronous or asynchronous calls, LLM Tools efficiently manages various AI model integration tasks.

English | 中文

Features

  • LLM Chat: Interact with large language models (e.g., GPT) with support for streaming output and formatted responses
  • Embedding Models: Support for single and multi-sentence embedding generation for semantic analysis and retrieval
  • Reranking Models: Rank documents by query similarity with support for single and multi-sentence inputs
  • Highly Configurable: Flexibly adjust parameters through YAML configuration files
  • Multi-Engine Support: Support for Azure OpenAI, local models, and various embedding engines
  • Async Support: Provides async interfaces for enhanced performance
  • Memory Management: Built-in chat memory management with customizable history length
  • Response Caching: Optional LLM response caching for improved efficiency

Supported Models

  • LLM Engines: Compatible with OpenAI SDK format
  • Embedding Models: m3e-base, bge-m3, and other embedding models
  • Reranking Models: bge-reranker-large and other reranking models

Installation

Using pip

# Clone the repository
git clone https://github.com/LLMSystems/llm_tools.git
cd llm_tools

# Install the package
pip install -e .

# Or install with development dependencies
pip install -e ".[dev]"

Configuration

1. Model Configuration

Create a configs/models.yaml file based on example_configs/models.yaml and configure the model parameters:

params:
    default:
        temperature: 0.2
        max_tokens: 1000
        top_p: 1
        frequency_penalty: 1.4
        presence_penalty: 0

LLM_engines:
    gpt-4o:
        model: "gpt-4o"
        azure_api_base: "your_azure_api_base_url"
        azure_api_key: "your_azure_api_key"
        azure_api_version: "your_azure_api_version"
    Qwen2-7B-Instruct:
        model: "Qwen2-7B-Instruct"
        local_api_key: "Empty"
        local_base_url: "http://localhost:8887/v1"
        translate_to_cht: true  # Optional: Translate to Traditional Chinese

embedding_models:
    m3e-base:
        model: "m3e-base"
        local_api_key: "Empty"
        local_base_url: "http://localhost:8887/v1"

reranking_models:
    bge-reranker-large:
        model: "bge-reranker-large"
        local_api_key: "Empty"
        local_base_url: "http://localhost:8887/v1"

2. Configuration Parameters

  • DEFAULT: Default parameters including temperature, max_tokens, top_p, frequency_penalty, presence_penalty
  • Azure OpenAI: Configure azure_api_base, azure_api_key, azure_api_version (Note: Usage may incur costs)
  • Local Models: Configure local_api_key and local_base_url
  • translate_to_cht: When set to True, automatically translates results to Traditional Chinese

Quick Start

Basic Chat Usage

from llm_chat import LLMChat

# Initialize LLM chat
llmchat = LLMChat(model="gpt-4o", config_path="./configs/models.yaml")

# Simple chat
response, history = llmchat.chat(query="Hello, how are you?")
print(response)

# Interactive chat with history
history = []
while True:
    user_input = input("You: ")
    if user_input.lower() == "quit":
        break
    response, history = llmchat.chat(query=user_input, history=history)
    print(f"AI: {response}")

Streaming Chat

from llm_chat import LLMChat

llmchat = LLMChat(model="gpt-4o", config_path="./configs/models.yaml")

# Streaming response
for chunk in llmchat.chat(query="Tell me a story", stream=True):
    print(chunk, end="", flush=True)
print()

Chat Memory Management

from llm_chat import LLMChat
from memory import ChatMemory

# Initialize chat memory
system_prompt = "You are a professional assistant who answers questions in Traditional Chinese."
chat_memory = ChatMemory(system_prompt=system_prompt, max_len=1000)

llmchat = LLMChat(model="gpt-4o", config_path="./configs/models.yaml")

# Streaming with memory
while True:
    user_input = input("You: ")
    if user_input.lower() == "quit":
        break
    
    text = ''
    for chunk in llmchat.chat(query=user_input, history=chat_memory.get_history(), stream=True):
        text += chunk
        print(chunk, end="", flush=True)
    print()
    
    chat_memory.add_user_message(user_input)
    chat_memory.add_system_response(text)

Embeddings and Reranking

import numpy as np
from embed_rerank_model import EmbeddingModel, RerankingModel

# Embedding generation
embed_model = EmbeddingModel(embedding_model="m3e-base", config_path="./configs/models.yaml")
query_embedding = np.array(embed_model.embed_query("The food is delicious."))
print(f"Embedding shape: {query_embedding.shape}")

# Document embedding
documents = ["The food is great.", "The service is excellent.", "The atmosphere is nice."]
doc_embeddings = embed_model.embed_documents(documents)
print(f"Document embeddings: {len(doc_embeddings)} vectors")

# Document reranking
rerank_model = RerankingModel(reranking_model="bge-reranker-large", config_path="./configs/models.yaml")
query = "Tell me about the food quality"
ranked_docs = rerank_model.rerank_documents(documents, query)
print(f"Reranked documents: {ranked_docs}")

Async Usage

import asyncio
from async_llm_chat import AsyncLLMChat

async def async_chat_example():
    # Initialize async LLM chat
    async_llm = AsyncLLMChat(model="gpt-4o", config_path="./configs/models.yaml")
    
    # Concurrent requests
    async def query_a():
        response, _ = await async_llm.chat(query="What is artificial intelligence?")
        return response
    
    async def query_b():
        response, _ = await async_llm.chat(query="What is machine learning?")
        return response
    
    # Execute concurrently
    responses = await asyncio.gather(query_a(), query_b())
    for i, response in enumerate(responses):
        print(f"Response {i+1}: {response}")

# Run async example
asyncio.run(async_chat_example())

Additional Features

Response Caching

from async_llm_chat import AsyncLLMChat

# Enable caching
cache_config = {
    'enable': True,
    'cache_file': './cache/llm_cache.json'
}

async_llm = AsyncLLMChat(
    model="gpt-4o", 
    config_path="./configs/models.yaml",
    cache_config=cache_config
)

Project Structure

llm_tools/
├── llm_chat.py              # Synchronous LLM chat functionality
├── async_llm_chat.py        # Asynchronous LLM chat functionality
├── embed_rerank_model.py    # Embedding and reranking models
├── memory.py                # Chat memory management
├── llm_response_cache.py    # Response caching functionality
├── tutorial.py              # Tutorial examples
├── tutorial.ipynb           # Jupyter notebook tutorial
├── example_configs/         # Configuration examples
│   └── models.yaml         # Model configuration template
├── pyproject.toml          # Project configuration
└── README_zh-CN.md         # Chinese README
└── README.md               # This file (English)

Examples and Tutorials

For detailed usage examples, please refer to:

  • tutorial.py - Python script examples
  • tutorial.ipynb - Jupyter notebook with interactive examples

License

This project is licensed under the MIT License.