LLM Tools

A comprehensive toolkit for Large Language Models (LLM) and embedding models, supporting chat, embeddings, and reranking functionality with flexible configuration options and multi-engine support. Whether handling synchronous or asynchronous calls, LLM Tools efficiently manages various AI model integration tasks.

English | 中文

Features

LLM Chat: Interact with large language models (e.g., GPT) with support for streaming output and formatted responses
Embedding Models: Support for single and multi-sentence embedding generation for semantic analysis and retrieval
Reranking Models: Rank documents by query similarity with support for single and multi-sentence inputs
Highly Configurable: Flexibly adjust parameters through YAML configuration files
Multi-Engine Support: Support for Azure OpenAI, local models, and various embedding engines
Async Support: Provides async interfaces for enhanced performance
Memory Management: Built-in chat memory management with customizable history length
Response Caching: Optional LLM response caching for improved efficiency

Supported Models

LLM Engines: Compatible with OpenAI SDK format
Embedding Models: m3e-base, bge-m3, and other embedding models
Reranking Models: bge-reranker-large and other reranking models

Installation

Using pip

# Clone the repository
git clone https://github.com/LLMSystems/llm_tools.git
cd llm_tools

# Install the package
pip install -e .

# Or install with development dependencies
pip install -e ".[dev]"

Configuration

1. Model Configuration

Create a configs/models.yaml file based on example_configs/models.yaml and configure the model parameters:

params:
    default:
        temperature: 0.2
        max_tokens: 1000
        top_p: 1
        frequency_penalty: 1.4
        presence_penalty: 0

LLM_engines:
    gpt-4o:
        model: "gpt-4o"
        azure_api_base: "your_azure_api_base_url"
        azure_api_key: "your_azure_api_key"
        azure_api_version: "your_azure_api_version"
    Qwen2-7B-Instruct:
        model: "Qwen2-7B-Instruct"
        local_api_key: "Empty"
        local_base_url: "http://localhost:8887/v1"
        translate_to_cht: true  # Optional: Translate to Traditional Chinese

embedding_models:
    m3e-base:
        model: "m3e-base"
        local_api_key: "Empty"
        local_base_url: "http://localhost:8887/v1"

reranking_models:
    bge-reranker-large:
        model: "bge-reranker-large"
        local_api_key: "Empty"
        local_base_url: "http://localhost:8887/v1"

2. Configuration Parameters

DEFAULT: Default parameters including temperature, max_tokens, top_p, frequency_penalty, presence_penalty
Azure OpenAI: Configure azure_api_base, azure_api_key, azure_api_version (Note: Usage may incur costs)
Local Models: Configure local_api_key and local_base_url
translate_to_cht: When set to True, automatically translates results to Traditional Chinese

Quick Start

Basic Chat Usage

from llm_chat import LLMChat

# Initialize LLM chat
llmchat = LLMChat(model="gpt-4o", config_path="./configs/models.yaml")

# Simple chat
response, history = llmchat.chat(query="Hello, how are you?")
print(response)

# Interactive chat with history
history = []
while True:
    user_input = input("You: ")
    if user_input.lower() == "quit":
        break
    response, history = llmchat.chat(query=user_input, history=history)
    print(f"AI: {response}")

Streaming Chat

from llm_chat import LLMChat

llmchat = LLMChat(model="gpt-4o", config_path="./configs/models.yaml")

# Streaming response
for chunk in llmchat.chat(query="Tell me a story", stream=True):
    print(chunk, end="", flush=True)
print()

Chat Memory Management

from llm_chat import LLMChat
from memory import ChatMemory

# Initialize chat memory
system_prompt = "You are a professional assistant who answers questions in Traditional Chinese."
chat_memory = ChatMemory(system_prompt=system_prompt, max_len=1000)

llmchat = LLMChat(model="gpt-4o", config_path="./configs/models.yaml")

# Streaming with memory
while True:
    user_input = input("You: ")
    if user_input.lower() == "quit":
        break
    
    text = ''
    for chunk in llmchat.chat(query=user_input, history=chat_memory.get_history(), stream=True):
        text += chunk
        print(chunk, end="", flush=True)
    print()
    
    chat_memory.add_user_message(user_input)
    chat_memory.add_system_response(text)

Embeddings and Reranking

import numpy as np
from embed_rerank_model import EmbeddingModel, RerankingModel

# Embedding generation
embed_model = EmbeddingModel(embedding_model="m3e-base", config_path="./configs/models.yaml")
query_embedding = np.array(embed_model.embed_query("The food is delicious."))
print(f"Embedding shape: {query_embedding.shape}")

# Document embedding
documents = ["The food is great.", "The service is excellent.", "The atmosphere is nice."]
doc_embeddings = embed_model.embed_documents(documents)
print(f"Document embeddings: {len(doc_embeddings)} vectors")

# Document reranking
rerank_model = RerankingModel(reranking_model="bge-reranker-large", config_path="./configs/models.yaml")
query = "Tell me about the food quality"
ranked_docs = rerank_model.rerank_documents(documents, query)
print(f"Reranked documents: {ranked_docs}")

Async Usage

import asyncio
from async_llm_chat import AsyncLLMChat

async def async_chat_example():
    # Initialize async LLM chat
    async_llm = AsyncLLMChat(model="gpt-4o", config_path="./configs/models.yaml")
    
    # Concurrent requests
    async def query_a():
        response, _ = await async_llm.chat(query="What is artificial intelligence?")
        return response
    
    async def query_b():
        response, _ = await async_llm.chat(query="What is machine learning?")
        return response
    
    # Execute concurrently
    responses = await asyncio.gather(query_a(), query_b())
    for i, response in enumerate(responses):
        print(f"Response {i+1}: {response}")

# Run async example
asyncio.run(async_chat_example())

Additional Features

Response Caching

from async_llm_chat import AsyncLLMChat

# Enable caching
cache_config = {
    'enable': True,
    'cache_file': './cache/llm_cache.json'
}

async_llm = AsyncLLMChat(
    model="gpt-4o", 
    config_path="./configs/models.yaml",
    cache_config=cache_config
)

Project Structure

llm_tools/
├── llm_chat.py              # Synchronous LLM chat functionality
├── async_llm_chat.py        # Asynchronous LLM chat functionality
├── embed_rerank_model.py    # Embedding and reranking models
├── memory.py                # Chat memory management
├── llm_response_cache.py    # Response caching functionality
├── tutorial.py              # Tutorial examples
├── tutorial.ipynb           # Jupyter notebook tutorial
├── example_configs/         # Configuration examples
│   └── models.yaml         # Model configuration template
├── pyproject.toml          # Project configuration
└── README_zh-CN.md         # Chinese README
└── README.md               # This file (English)

Examples and Tutorials

For detailed usage examples, please refer to:

tutorial.py - Python script examples
tutorial.ipynb - Jupyter notebook with interactive examples

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM Tools

Features

Supported Models

Installation

Using pip

Configuration

1. Model Configuration

2. Configuration Parameters

Quick Start

Basic Chat Usage

Streaming Chat

Chat Memory Management

Embeddings and Reranking

Async Usage

Additional Features

Response Caching

Project Structure

Examples and Tutorials

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

LLM Tools

Features

Supported Models

Installation

Using pip

Configuration

1. Model Configuration

2. Configuration Parameters

Quick Start

Basic Chat Usage

Streaming Chat

Chat Memory Management

Embeddings and Reranking

Async Usage

Additional Features

Response Caching

Project Structure

Examples and Tutorials

License