Skip to content

tlbx-ai/Ai.Tlbx.Inference

Repository files navigation

Ai.Tlbx.Inference

NuGet License: MIT

Trim-friendly .NET AI inference client for OpenAI, Anthropic, Google, and xAI, built around a single inference facade.

Features

  • Multi-provider — OpenAI, Anthropic, Google (AI Studio + Vertex), xAI behind a single interface
  • StreamingIAsyncEnumerable<string> across all supported providers with integration coverage
  • Document attachments — provider-aware document attachment support with live coverage for supported models
  • Structured outputCompleteAsync(..., JsonTypeInfo<T>) for trim- and AOT-safe typed results
  • Model registry — first-class AiModelCatalog descriptors with endpoint/capability metadata
  • Tool calling — unified tool loop with streaming support, lives once in the facade
  • Embeddings — OpenAI and Google embedding models with batch support
  • Image generation — Google Gemini image generation via gemini-2.5-flash-image
  • Token meteringTokenUsage on every response including cache and thinking tokens
  • Resilience — Polly v8 retry and timeout handling wired into provider HTTP execution
  • Thinking budget — universal mapping across all providers that support reasoning
  • DiagnosticsCompletionDiagnostics exposes endpoint family, stop reason, and truncation hints
  • AOT/trimming-first — manual provider JSON construction/parsing where it reduces reflection risk and wire bloat

Design Goals

  • One public facade: IAiInferenceClient
  • Internal provider implementations
  • Shared HttpClient per client instance
  • Explicit model capability metadata instead of implicit provider heuristics
  • Sparse provider payloads: omit null/default fields unless the upstream API requires them

Supported Models

OpenAI

Model Enum Context
GPT-5.2 AiModel.Gpt52 400k
GPT-5.2 Pro AiModel.Gpt52Pro 400k
GPT-5.2 Chat AiModel.Gpt52Chat 128k
GPT-5.3 Chat AiModel.Gpt53Chat 128k
GPT-5.4 AiModel.Gpt54 1.05M

Anthropic

Model Enum Context
Claude Opus 4.6 AiModel.ClaudeOpus46 200k
Claude Sonnet 4.6 AiModel.ClaudeSonnet46 200k
Claude Haiku 4.5 AiModel.ClaudeHaiku45 200k

Google

Model Enum Context
Gemini 3 Flash Preview AiModel.Gemini3FlashPreview 1M
Gemini 3.1 Pro Preview AiModel.Gemini31ProPreview 1M
Gemini 3.1 Flash-Lite Preview AiModel.Gemini31FlashLitePreview 1M

xAI

Model Enum Context
Grok 4.1 Fast AiModel.Grok41Fast 2M
Grok 4 AiModel.Grok4 256k

Embeddings

Model Enum Dimensions Provider
text-embedding-3-large EmbeddingModel.TextEmbedding3Large 3072 OpenAI
text-embedding-3-small EmbeddingModel.TextEmbedding3Small 1536 OpenAI
gemini-embedding-001 EmbeddingModel.GeminiEmbedding001 3072 Google

Installation

dotnet add package Ai.Tlbx.Inference

Quick Start

Model Registry

var descriptor = AiModelCatalog.Get(AiModel.Gpt52Pro);

Console.WriteLine(descriptor.ApiName);
Console.WriteLine(descriptor.PreferredEndpoint);
Console.WriteLine(descriptor.Capabilities.SupportsResponsesApi);

DI Registration

services.AddAiInference(options =>
{
    options.AddOpenAi("sk-...");
    options.AddAnthropic("sk-ant-...");
    options.AddGoogle("AIza...");
    options.AddXai("xai-...");
});

Simple Completion

var response = await client.CompleteAsync(new CompletionRequest
{
    Model = AiModel.ClaudeOpus46,
    Messages = [new ChatMessage { Role = ChatRole.User, Content = "Hello!" }]
});

Console.WriteLine(response.Content);
Console.WriteLine($"Tokens: {response.Usage.TotalTokens}");
Console.WriteLine(response.Diagnostics?.EndpointFamily);
Console.WriteLine(response.Diagnostics?.Note);

Streaming

await foreach (var delta in client.StreamAsync(new CompletionRequest
{
    Model = AiModel.Gpt52,
    Messages = [new ChatMessage { Role = ChatRole.User, Content = "Write a haiku" }]
}))
{
    Console.Write(delta);
}

Structured Output

public sealed record WeatherInfo
{
    public required string City { get; init; }
    public required double Temperature { get; init; }
    public required string Condition { get; init; }
}

var response = await client.CompleteAsync(new CompletionRequest
{
    Model = AiModel.Gemini31FlashLitePreview,
    JsonSchema = """{"type":"object","properties":{"city":{"type":"string"},"temperature":{"type":"number"},"condition":{"type":"string"}},"required":["city","temperature","condition"]}""",
    Messages = [new ChatMessage { Role = ChatRole.User, Content = "Weather in Berlin?" }]
}, MyJsonContext.Default.WeatherInfo);

Console.WriteLine($"{response.Content.City}: {response.Content.Temperature}°C, {response.Content.Condition}");

Model Validation

var validation = AiModelValidator.ValidateForCompletion(
    AiModel.Gpt52Pro,
    streaming: false,
    tools: false,
    structuredOutput: false);

foreach (var warning in validation.Warnings)
{
    Console.WriteLine(warning);
}

Document Attachments

var attachment = new DocumentAttachment
{
    FileName = "brief.pdf",
    MimeType = "application/pdf",
    Content = File.ReadAllBytes("brief.pdf")
};

var response = await client.CompleteAsync(new CompletionRequest
{
    Model = AiModel.Gemini31FlashLitePreview,
    Messages =
    [
        new ChatMessage
        {
            Role = ChatRole.User,
            Content = "Read the attached document and summarize it in three bullet points.",
            Attachments = [attachment]
        }
    ]
});

Console.WriteLine(response.Content);

Tool Calling

var tools = new List<ToolDefinition>
{
    new()
    {
        Name = "get_weather",
        Description = "Get current weather for a city",
        ParametersSchema = JsonDocument.Parse(
            """{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}""").RootElement
    }
};

var result = await client.CompleteWithToolsAsync(
    new CompletionRequest
    {
        Model = AiModel.ClaudeSonnet46,
        Messages = [new ChatMessage { Role = ChatRole.User, Content = "What's the weather in Tokyo?" }]
    },
    tools,
    toolExecutor: async call =>
    {
        var weather = GetWeather(call.Arguments);
        return new ToolCallResult
        {
            ToolCallId = call.Id,
            Result = JsonSerializer.Serialize(weather)
        };
    });

Console.WriteLine(result.Content);
Console.WriteLine($"Tool iterations: {result.Iterations}, Total tokens: {result.Usage.TotalTokens}");

Embeddings

var embedding = await client.EmbedAsync(new EmbeddingRequest
{
    Model = EmbeddingModel.TextEmbedding3Large,
    Input = "The quick brown fox"
});

Console.WriteLine($"Dimensions: {embedding.Embedding.Length}");

Image Generation

var imageBytes = await client.GenerateImageAsync(new ImageGenerationRequest
{
    Prompt = "A product photo of a matte black teapot on a concrete counter"
});

await File.WriteAllBytesAsync("teapot.png", imageBytes);

Google image generation currently uses gemini-2.5-flash-image for both AI Studio and Vertex configurations. The library returns the first inline image bytes from Google's response. Size and Quality are reserved on the public request type but are not provider-mapped yet.

Configuration

Logging

services.AddAiInference(options =>
{
    options.AddOpenAi("sk-...");
    options.WithLogging((level, message) =>
    {
        Console.WriteLine($"[{level}] {message}");
    });
});

Custom Retry Policy

var customPipeline = new ResiliencePipelineBuilder<HttpResponseMessage>()
    .AddRetry(new RetryStrategyOptions<HttpResponseMessage>
    {
        MaxRetryAttempts = 2,
        Delay = TimeSpan.FromSeconds(5)
    })
    .Build();

services.AddAiInference(options =>
{
    options.AddOpenAi("sk-...");
    options.WithRetryPolicy(customPipeline);
});

Google Vertex AI

services.AddAiInference(options =>
{
    options.AddGoogle(
        serviceAccountJson: File.ReadAllText("service-account.json"),
        projectId: "my-project",
        location: "us-central1");
});

Thinking Budget

var response = await client.CompleteAsync(new CompletionRequest
{
    Model = AiModel.ClaudeOpus46,
    ThinkingBudget = 10000,
    Messages = [new ChatMessage { Role = ChatRole.User, Content = "Solve this complex problem..." }]
});

Console.WriteLine($"Thinking tokens used: {response.Usage.ThinkingTokens}");

Prompt Caching (Anthropic)

var response = await client.CompleteAsync(new CompletionRequest
{
    Model = AiModel.ClaudeSonnet46,
    EnableCache = true,
    SystemMessage = longSystemPrompt,
    Messages = [new ChatMessage { Role = ChatRole.User, Content = "Question..." }]
});

Console.WriteLine($"Cache read: {response.Usage.CacheReadTokens}, Cache write: {response.Usage.CacheWriteTokens}");

AOT / Trimming

The library is AOT and trimming compatible (IsAotCompatible, IsTrimmable).

For structured output and typed tool results, use the JsonTypeInfo<T> overloads and provide your schema explicitly:

[JsonSerializable(typeof(WeatherInfo))]
internal partial class MyJsonContext : JsonSerializerContext { }

var response = await client.CompleteAsync(
    new CompletionRequest
    {
        Model = AiModel.Gemini31FlashLitePreview,
        JsonSchema = """{"type":"object","properties":{"city":{"type":"string"},"temp":{"type":"number"}},"required":["city","temp"]}""",
        Messages = [new ChatMessage { Role = ChatRole.User, Content = "Weather in Berlin?" }]
    },
    MyJsonContext.Default.WeatherInfo);

The non-generic methods (CompleteAsync, StreamAsync, EmbedAsync, etc.) are always AOT-safe.

Runtime Notes

  • The library does not create ad hoc provider HttpClient instances. A single shared HttpClient is supplied to each AiInferenceClient instance and reused by all configured providers for that client.
  • Provider request payloads are built manually with JsonObject/JsonArray so the library can omit unused fields and stay predictable under trimming/AOT.
  • A small shared JSON policy is used for serializer-based paths, and source-generated DTOs are used for a few stable internal envelopes such as embedding and upload responses.

Paid Integration Tests

Live provider integration tests cost money and are intended to run on demand only.

Recommended environment variables:

  • OPENAI_API_KEY
  • ANTHROPIC_API_KEY
  • GOOGLE_API_KEY
  • XAI_API_KEY

The live suite includes:

  • simple prompt smoke tests
  • streaming tests that verify multi-chunk output
  • document attachment tests for supported models

These tests are intentionally not run automatically.

License

MIT

About

Universal .NET AI inference client for OpenAI, Anthropic, Google, and xAI

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors