🖼️ AI Agent Image Generator

An AI Agent–powered Text-to-Image Generator built using the Smol Agents framework, deployed on Gradio and hosted on Hugging Face Spaces.
This project demonstrates how an autonomous agent can plan, reason, use tools, and generate images from natural language prompts.

Usage

First type in prompt like "Generate image of a horse" or anything
Press ENTER
The AI agent will then perform multi-step reasoning to generate the correct image.

🚀 Live Demo

🔗 Hugging Face Spaces (Gradio App):

https://huggingface.co/spaces/birubhai/ai-agent-image-generator

🧠 Project Overview

This project is an AI Agent Image Converter that follows the Thought → Action → Observation (TAO) cycle of agent-based systems.

Instead of directly calling a model, the agent:

Understands the user intent
Plans the steps needed
Chooses tools intelligently
Generates an image from text
Returns the final output as an agent-compatible image response

The agent architecture and behavior are fully configurable via YAML and JSON configurations.

🛠️ Tech Stack

Smol Agents Framework
Qwen / Qwen2.5-Coder (Hugging Face)
Gradio (UI)
Hugging Face Spaces (Deployment)
DuckDuckGo Search
Pandas
PIL (Image Handling)

📦 Requirements

The core dependencies used in this project:

smolagents
requests
duckduckgo-search
pandas
gradio
Pillow

🧩 Architecture Overview

.
├── app.py
├── ui.py
├── prompts.yml
├── agents.json
├── tools/
│ ├── finalanswer.py
│ ├── websearch.py
│ └── visitwebpage.py

📄 File Breakdown

🔹 `prompts.yml`

Contains a detailed system prompt
Defines the Thought–Action–Observation (TAO) loop
Includes:
- Planning steps
- Tool usage instructions
- Agent behavior rules
- Decision-making guidelines

This file controls how the agent thinks and acts.

🔹 `agents.json`

Central configuration for the agent, including:

Model configuration
Tool registry
Prompt templates
Max reasoning steps
Verbosity level
Planning interval
Agent execution parameters

This makes the agent fully configurable without changing code.

🔹 `app.py`

The main agent execution file.

Key components:

Custom dummy tool (no-op tool for agent compatibility)
Tool to fetch current time with timezone
Integration with Qwen / Qwen2.5-Coder model
Uses:
- Text-to-image generation
- DuckDuckGo search (optional)
Executes the agent loop and returns results

🔹 `ui.py`

Builds the Gradio interface
Handles:
- Text input
- Image output
- Agent responses
Connects the UI with the agent logic

🧰 Tools Folder

🔸 `finalanswer.py`

Responsible for returning the agent’s final output
Logic:
- If output is already an agent image → return directly
- If output is a PIL image → wrap it as an agent image
- Forward the final response cleanly to the UI

This ensures compatibility between agent outputs and Gradio UI.

🔸 `websearch.py`

Tool for DuckDuckGo-based web search
Available to the agent (not explicitly used in this project)

🔸 `visitwebpage.py`

Tool for visiting and extracting webpage content
Included for extensibility

🎯 Core Functionality

✔ Text → Image Generation
✔ Agent-based planning & reasoning
✔ Tool-based execution
✔ Modular & configurable design
✔ Hosted on Hugging Face Spaces

The main focus of this project is text-to-image conversion using an AI agent, not just a direct model call.

🧪 How It Works (Agent Flow)

User enters a text prompt
Agent reasons using the TAO cycle
Planning steps are executed
Image generation tool is invoked
Output is wrapped and returned
Gradio displays the generated image

🌐 Deployment

Hosted on Hugging Face Spaces
UI powered by Gradio
Model accessed from Hugging Face Hub

🔮 Future Improvements

Multi-image generation
Image editing agents
Memory-based agents
Multi-agent collaboration
Web-grounded image prompts

👨‍💻 Author

Biresh Kumar Singh
Agentic AI Enthusiast

📜 License

This project is open-source.

⭐ If you found this project helpful, feel free to star the repository and explore agentic AI further!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gradio		.gradio
__pycache__		__pycache__
tools		tools
.gitignore		.gitignore
Gradio_UI.py		Gradio_UI.py
README.md		README.md
agents.json		agents.json
app.py		app.py
prompts.yaml		prompts.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🖼️ AI Agent Image Generator

Usage

🚀 Live Demo

🧠 Project Overview

🛠️ Tech Stack

📦 Requirements

🧩 Architecture Overview

📄 File Breakdown

🔹 `prompts.yml`

🔹 `agents.json`

🔹 `app.py`

🔹 `ui.py`

🧰 Tools Folder

🔸 `finalanswer.py`

🔸 `websearch.py`

🔸 `visitwebpage.py`

🎯 Core Functionality

🧪 How It Works (Agent Flow)

🌐 Deployment

🔮 Future Improvements

👨‍💻 Author

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🖼️ AI Agent Image Generator

Usage

🚀 Live Demo

🧠 Project Overview

🛠️ Tech Stack

📦 Requirements

🧩 Architecture Overview

📄 File Breakdown

🔹 prompts.yml

🔹 agents.json

🔹 app.py

🔹 ui.py

🧰 Tools Folder

🔸 finalanswer.py

🔸 websearch.py

🔸 visitwebpage.py

🎯 Core Functionality

🧪 How It Works (Agent Flow)

🌐 Deployment

🔮 Future Improvements

👨‍💻 Author

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

🔹 `prompts.yml`

🔹 `agents.json`

🔹 `app.py`

🔹 `ui.py`

🔸 `finalanswer.py`

🔸 `websearch.py`

🔸 `visitwebpage.py`

Packages