Skip to content

TryOn VTON Agent: LangChain-Based Multi-Provider Agent#80

Merged
kailashahirwar merged 1 commit intomainfrom
agents
Dec 19, 2025
Merged

TryOn VTON Agent: LangChain-Based Multi-Provider Agent#80
kailashahirwar merged 1 commit intomainfrom
agents

Conversation

@kailashahirwar
Copy link
Collaborator

@kailashahirwar kailashahirwar commented Dec 19, 2025

Virtual Try-On Agent - LangChain-Based Multi-Provider Agent

🎯 Overview

This PR introduces a LangChain-based Virtual Try-On Agent that intelligently selects and uses the appropriate virtual try-on adapter based on natural language prompts. The agent provides a unified interface to multiple VTOn providers (Kling AI, Amazon Nova Canvas, and Segmind) with automatic provider selection and comprehensive error handling.

✨ Features

Core Agent Capabilities

  • Intelligent Provider Selection: Automatically chooses the best VTOn adapter based on keywords in user prompts
  • Multi-LLM Support: Compatible with OpenAI GPT, Anthropic Claude, and Google Gemini
  • Natural Language Interface: Accepts conversational prompts like "Use Kling AI to create a virtual try-on"
  • Real-time Progress Tracking: Streaming support with intermediate step visibility
  • Token-Efficient Caching: Stores full image data in memory cache to avoid LLM token limits
  • Flexible Input: Supports both file paths and URLs for images

Supported Virtual Try-On Providers

  1. Kling AI - High-quality results with asynchronous processing
  2. Amazon Nova Canvas - AWS Bedrock integration with automatic garment detection
  3. Segmind - Fast and efficient for quick iterations

CLI Interface

  • Comprehensive command-line tool (vton_agent.py)
  • Support for all three LLM providers
  • Configurable temperature, model selection, and output directory
  • Verbose mode for debugging agent reasoning
  • Automatic image format handling (PNG/Base64)

📁 Files Added/Modified

New Files

  • tryon/agents/vton/agent.py - Main VTOnAgent class implementation
  • tryon/agents/vton/tools.py - LangChain tool wrappers for each VTOn adapter
  • tryon/agents/vton/__init__.py - Module exports
  • vton_agent.py - CLI interface for the agent
  • docs/docs/agents/vton-agent.md - Comprehensive documentation

Modified Files

  • tryon/agents/__init__.py - Export VTOnAgent module
  • README.md - Added VTOn Agent section with usage examples
  • requirements.txt - Added LangChain dependencies
  • docs/sidebars.ts - Added agent documentation to sidebar

🚀 Usage

Python API

from tryon.agents.vton import VTOnAgent

# Initialize agent with OpenAI
agent = VTOnAgent(llm_provider="openai")

# Generate virtual try-on
result = agent.generate(
    person_image="person.jpg",
    garment_image="shirt.jpg",
    prompt="Create a virtual try-on using Kling AI"
)

if result["status"] == "success":
    images = result["images"]
    provider = result["provider"]
    print(f"Generated {len(images)} images using {provider}")

CLI

# Basic usage
python vton_agent.py --person person.jpg --garment shirt.jpg \
    --prompt "Use Kling AI for high-quality try-on"

# Use Anthropic Claude
python vton_agent.py --person person.jpg --garment shirt.jpg \
    --prompt "Generate with Nova Canvas" --llm-provider anthropic

# Verbose mode with custom output directory
python vton_agent.py --person person.jpg --garment shirt.jpg \
    --prompt "Try Segmind for fast results" \
    --output-dir results/ --verbose

🏗️ Architecture

The agent follows LangChain's ReAct pattern:

User Prompt → LLM Reasoning → Tool Selection → Adapter Execution → Result Formatting
  1. User provides: Person image, garment image, and natural language prompt
  2. Agent analyzes: Prompt to identify desired provider (or defaults to Kling AI)
  3. Tool executes: Selected adapter with provided images
  4. Cache stores: Full image data (avoiding token limits)
  5. Agent returns: Structured result with status, provider, and images

🔧 Technical Details

LangChain Integration

  • Uses create_agent() API for agent creation
  • Custom @tool decorators for each VTOn provider
  • Pydantic schemas for type-safe tool inputs
  • Async streaming with astream() for progress tracking

Cache Management

  • Global in-memory cache for tool outputs
  • MD5-based cache keys to reference full image data
  • Prevents LLM token exhaustion from large base64 images
  • Cache retrieval via get_tool_output_from_cache()

Error Handling

  • Comprehensive try-catch blocks in all tools
  • Graceful fallback from streaming to standard execution
  • Detailed error messages with provider context
  • Validation for file paths and URLs

📦 Dependencies

New dependencies added to requirements.txt:

langchain>=1.0.0
langchain-openai>=0.2.0
langchain-anthropic>=0.2.0
langchain-google-genai>=2.0.0
pydantic>=2.0.0

✅ Testing

Manual Testing Checklist

  • Agent initializes with OpenAI provider
  • Agent initializes with Anthropic provider
  • Agent initializes with Google provider
  • Kling AI provider selection via prompt
  • Nova Canvas provider selection via prompt
  • Segmind provider selection via prompt
  • Default to Kling AI when no provider specified
  • File path input handling
  • URL input handling
  • CLI argument parsing
  • Verbose mode output
  • Error handling for missing API keys
  • Cache-based image retrieval

Test Commands Used

# Test with Kling AI
python vton_agent.py --person data/female-model.jpeg --garment data/garment.png \
    --prompt "Use Kling AI" --verbose

# Test with Nova Canvas
python vton_agent.py --person data/female-model.jpeg --garment data/garment.png \
    --prompt "Try Amazon Nova Canvas" --verbose

# Test with Segmind
python vton_agent.py --person data/female-model.jpeg --garment data/garment.png \
    --prompt "Generate with Segmind" --verbose

📚 Documentation

Complete documentation available at:

🐛 Known Issues & Future Work

Known Issues

  1. create_agent() API compatibility needs verification with latest LangChain version
  2. Default model names (e.g., gpt-5.1) may need updates based on actual model availability
  3. Global cache has no TTL or size limits (unlimited growth)

Future Enhancements

  • Add unit and integration tests
  • Implement generate_and_decode() method fully
  • Add separate async/sync methods (generate() and agenerate())
  • Implement cache TTL and size limits
  • Add input validation (image format, size, accessibility)
  • Add rate limiting and retry logic with exponential backoff
  • Add logging and metrics/telemetry
  • Support for additional VTOn providers
  • Batch processing support for multiple images

🔐 Environment Variables Required

# LLM Providers (at least one required)
export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"
export GOOGLE_API_KEY="your-google-key"

# VTOn Providers (based on which one you'll use)
export KLING_API_KEY="your-kling-key"
export KLING_API_SECRET="your-kling-secret"
export AWS_ACCESS_KEY_ID="your-aws-key"
export AWS_SECRET_ACCESS_KEY="your-aws-secret"
export AWS_REGION="us-east-1"
export SEGMIND_API_KEY="your-segmind-key"

🎉 Benefits

  1. Unified Interface: Single API for multiple VTOn providers
  2. Developer-Friendly: Natural language prompts instead of remembering adapter APIs
  3. Extensible: Easy to add new providers by creating new tools
  4. Production-Ready: Error handling, streaming, and caching built-in
  5. Framework Agnostic: Works with any LangChain-compatible LLM

📝 Migration Guide

For users currently using adapters directly:

Before (Direct Adapter Usage)

from tryon.api import KlingAIVTONAdapter

adapter = KlingAIVTONAdapter()
result = adapter.generate(source_image="person.jpg", reference_image="shirt.jpg")

After (Agent Usage)

from tryon.agents.vton import VTOnAgent

agent = VTOnAgent()
result = agent.generate(
    person_image="person.jpg",
    garment_image="shirt.jpg",
    prompt="Use Kling AI"
)

Note: Direct adapter usage still works and is recommended for programmatic/batch processing where you know exactly which provider to use.

🤝 Contributing

To add a new VTOn provider to the agent:

  1. Create the adapter in tryon/api/
  2. Add a tool in tryon/agents/vton/tools.py:
    @tool("provider_name_virtual_tryon", args_schema=YourToolInput)
    def provider_virtual_tryon(person_image, garment_image, **kwargs):
        adapter = YourAdapter()
        result = adapter.generate(...)
        return json.dumps(result)
  3. Update get_vton_tools() to include your tool
  4. Update system prompt in agent.py to mention your provider
  5. Add documentation and examples

Type: Feature
Priority: High
Breaking Changes: None
Backward Compatible: Yes

@cursor
Copy link

cursor bot commented Dec 19, 2025

You have run out of free Bugbot PR reviews for this billing cycle. This will reset on January 16.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

@kailashahirwar kailashahirwar changed the title tryon agent for vton added TryOn VTON Agent: LangChain-Based Multi-Provider Agent Dec 19, 2025
@kailashahirwar kailashahirwar merged commit 847db92 into main Dec 19, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant