vis2attr - Vision Language Model for Attribute Extraction

Turn item photos into structured attributes (brand, colors, materials, condition) using Visual Language Models. Output strict JSON with per-field confidence scores.

High-Level Architecture

flowchart LR
    A[📁 Images<br/>JPG/PNG/WebP] --> B[📝 Prompt<br/>Jinja2]
    B --> C[🤖 VLM<br/>Mistral/OpenAI]
    C --> D[📊 Parse<br/>JSON]
    D --> E[⚖️ Decide<br/>Thresholds]
    E --> F[💾 Store<br/>Parquet]
    F --> G[📈 Export<br/>CLI]
    
    CONFIG[⚙️ Config<br/>YAML] --> B
    CONFIG --> C

🚀 Quick Start

# Install
uv venv && source .venv/bin/activate
uv pip install -e .

# Set up API key
export MISTRAL_API_KEY=your_api_key_here

# Run analysis
vis2attr analyze --input ./images --output ./predictions.parquet

# Batch processing with custom config
vis2attr analyze --input ./items --batch --config ./my-config.yaml --verbose

# Generate reports
vis2attr report --predictions ./predictions.parquet --format summary

📊 Status

Core infrastructure complete with working pipeline. Ready for production improvements.

Status	Component	Description
✅	Core Pipeline	Complete data models, configuration, Mistral provider, JSON parser
✅	File Processing	Image ingestor with EXIF stripping, comprehensive test suite
✅	CLI Interface	Full analyze command with batch processing, verbose logging, and result export
✅	Storage System	Parquet-based storage with lineage tracking and metadata storage
🚧	Decision Rules	Simple threshold-based decisions, needs sophisticated quality gates
🚧	Report Generation	CLI command structure ready, needs full implementation
❌	Metrics & Logging	Basic logging only, needs structured metrics collection
❌	Additional Providers	Only Mistral implemented (OpenAI, Google, Anthropic planned)

📚 Documentation

Architecture - System design, data flow, and component overview
CLI Reference - Complete command-line interface documentation
Configuration - Configuration options and schema definitions
API Reference - Data models, schemas, and programming interface
Contributing - Development guidelines and contribution process

🎯 Design Principles

Schema-first & config-driven: No hard-coded fields
Ports & adapters: Swappable implementations via factory patterns
Type safety: Comprehensive data models with validation
Testability: Full test coverage for all components

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
config		config
docs		docs
src/vis2attr		src/vis2attr
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

vis2attr - Vision Language Model for Attribute Extraction

High-Level Architecture

🚀 Quick Start

📊 Status

📚 Documentation

🎯 Design Principles

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Languages

License

boijuny/vis2attr

Folders and files

Latest commit

History

Repository files navigation

vis2attr - Vision Language Model for Attribute Extraction

High-Level Architecture

🚀 Quick Start

📊 Status

📚 Documentation

🎯 Design Principles

🤝 Contributing

📄 License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages