To solve the Gap of tedious manual image generation to train on single scenario, we provide the AI powered application to automate the whole process and make it a one stop solution for all creation, modification and augmentation needs for a image Data. We are keeping it open source so that the Community can utilize its capabilities, in order to Solve Dataset Needs for their ML applications. The Project Utilizes the Gen-AI capabilities to Generate Data to match real world Scenarios. Automating the tedious task of collecting Data and scraping. Agentic Scraping of Data, reflecting explainability over the provided results. eg. Results like Perplexity. Integration of Generative capabilities and Augmentation to create versatile databases to cater your research needs.
A FastAPI-based service that generates synthetic image datasets using LangGraph workflow and AI models. This project combines the power of language models and image generation to create customizable datasets for various use cases.
- Interactive chat-based interface for dataset generation
- Customizable dataset parameters (size, resolution)
- Support for multiple AI models (Groq, OpenAI)
- Session management for maintaining context
- Sample image preview before full dataset generation
- RESTful API endpoints for integration
- Python 3.11+
- Node.js (for frontend development)
- Groq API key
- OpenAI API key
- Flask For backend API
- Python libraries = langgraph langsmith langchain langchain_groq langchain_community langchain_openai FastAPI uvicorn
- Clone the repository
- Install Python dependencies:
pip install fastapi langchain langgraph langchain_groq langchain_openai python-dotenv
- Install frontend dependencies:
npm install
Create a .env file in the root directory with your API keys:
GROQ_API_KEY=your_groq_api_key
OPENAI_API_KEY=your_openai_api_keyChat with the UI
- Chat with the conversation Agent to seamlessly get preffered responses.
- You can Ask for sample images, review and fine tune the diversity of generated prompts which acts as the basis of the dataset.
- The Full Image Dataset Zip file will be Provided to the User once he is satisfied with sample images, prompts and is ready to get Full Dataset.
Process user messages and generate responses/images.
{
"message": "Generate images of mountains",
"session_id": "unique_session_id"
}Customize dataset generation parameters.
{
"num_images": 50,
"resolution": "256x256",
"session_id": "unique_session_id"
}Set custom API keys for external services.
{
"groq_api_key": "your_groq_api_key",
"openai_api_key": "your_openai_api_key",
"session_id": "unique_session_id"
}Retrieve session information and status.
-
Start the FastAPI server:
python main.py
The server will run on
http://localhost:8000 -
Set up your API keys using the
/set_api_keysendpoint -
Start a conversation by sending a message to the
/chatendpoint -
Customize dataset parameters using
/set_dataset_parametersif needed -
Monitor session status and retrieve generated images through the session endpoint
main.py: FastAPI application and API endpointsDependancies.py: Core functionality for image generation workflowFS_BackendTesting.ipynb: Development and testing notebook
When the repository was first created, we used git init to initialize it locally. This command set up the .git directory, which contains all the metadata and version history of the project. For example:
git initThis ensured the repository was ready to track changes and collaborate.
During development, we staged files before committing them. For instance, when we made changes to core scripts like main.py or updated the README file, we ran:
git add main.py README.mdThis placed the files into the staging area, marking them ready for a commit.
After staging the files, we recorded snapshots of the repository by committing them. Here's an example of a commit message we might have used:
git commit -m "Add main.py and update README with project details"This created a checkpoint in the repository's history.
To share changes with collaborators, we pushed commits to the remote repository, hosted on GitHub. For example:
git push origin mainThis command uploaded our local main branch to the remote repository, ensuring others could access the latest updates.
To keep our local repository in sync with the remote, we often pulled updates made by other contributors. For example:
git pull origin mainThis fetched the latest changes from the main branch and merged them into our local branch.
We created and switched between branches to work on different features or fixes. For instance:
git branch feature/new-feature
git checkout feature/new-featureThis created a new branch called feature/new-feature and switched to it. After finishing the work, we merged it back into the main branch.
For new developers joining the project, cloning the repository was the first step. They ran:
git clone https://github.com/chainSAW-crypto/chainSAW-crypto.gitThis copied all the files, branches, and commit history to their local machine.
Here’s an example of how we might have used these commands in sequence:
- Start a New Feature:
git branch feature/add-authentication git checkout feature/add-authentication
- Make Changes and Stage Them:
git add auth.py
- Commit the Changes:
git commit -m "Implement user authentication module" - Push the Feature Branch for Collaboration:
git push origin feature/add-authentication
- Merge the Feature into Main:
git checkout main git merge feature/add-authentication git push origin main
This tailored explanation demonstrates how these commands might have been actively used in the development of this repository.
To ensure the EC2 instance is up to date, the following commands have been executed:
sudo apt update && sudo apt upgrade -yThis updates the package list and installs the latest versions of packages.
Essential tools such as Python, pip, and Git have been installed to set up the environment:
sudo apt install python3 python3-pip git -yThis ensures the necessary Python version and Git are available for managing the repository.
The repository has been cloned from GitHub to deploy the backend:
git clone https://github.com/chainSAW-crypto/Synthetic-ImageData-Generator.git
cd Synthetic-ImageData-GeneratorAll files, branches, and commit history of the repository have been pulled into the EC2 instance.
A Python virtual environment has been created and activated to isolate dependencies:
python3 -m venv venv
source venv/bin/activateThis ensures that all installed packages are local to the project and do not interfere with the system Python environment.
All required libraries for the project have been installed using:
pip install -r requirements.txtDependencies listed in the requirements.txt file, such as Flask, FastAPI, or ML-related libraries, are installed.
Environment variables have been configured by editing the .env file. On EC2, permissions have been set correctly:
nano .env
chmod 600 .envThe .env file contains sensitive configurations like API keys or database credentials.
The backend server has been started using the nohup command to keep it running even after logging out:
nohup python3 app.py > backend.log 2>&1 &Logs are redirected to backend.log.
To ensure the backend runs autonomously and restarts if the server reboots, cronjobs have been added:
Editing the cron jobs:
crontab -eEntries added:
- To start the backend on reboot:
@reboot cd /home/ubuntu/Synthetic-ImageData-Generator && source venv/bin/activate && nohup python3 app.py > backend.log 2>&1 &- To schedule a periodic task to let it run continiously.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.