An intelligent classification-focused machine learning system powered by Google Agent Development Kit (ADK) and Gemini 2.0 Flash that can autonomously train classification models, make predictions, and provide ML expertise through natural language conversations.
- Training Requests: Automatically detects when you want to train a classification model and selects appropriate algorithms
- Prediction Requests: Finds existing models or trains new ones for classification inference tasks
- Model Management: Lists, compares, and manages your trained classification models
- General AI Assistant: Answers classification ML questions and provides educational guidance
- Smart Model Selection: Chooses the best classification algorithm based on your task
- Auto-Training: Creates new classification models when none exist for your prediction requests
- Data Handling: Processes various data formats (CSV, Excel, JSON) or generates synthetic data
- Visualization: Automatically creates plots, confusion matrices, and training visualizations
- Random Forest Classifier - Robust, handles mixed data types, excellent default choice
- Support Vector Machine (SVM) - Great for complex decision boundaries and high-dimensional data
- Logistic Regression - Fast, interpretable linear classification with probability outputs
- Training metrics and validation curves
- Feature importance plots
- Confusion matrices for classification performance
- Classification reports with precision, recall, F1-score
- Data distribution visualizations
- Python 3.9+
- Google AI API Key (get from Google AI Studio)
# Clone or download the project
cd google-adk-experiment
# Install dependencies
pip install -r requirements.txtCreate a .env file in the root directory:
# Required: Google AI API Key
GOOGLE_API_KEY=your_google_api_key_here
# Optional: Custom directories
MODELS_DIR=./models
TRAINING_DATA_DIR=./data
RESULTS_DIR=./results
# Optional: ADK configuration
ADK_APP_NAME=ml_agentic_system
ADK_USER_ID=default_userstreamlit run app.pyOpens at http://localhost:8501
python cli.pyThe system understands natural language and automatically routes your classification requests:
"Train a classification model to predict customer churn"
"Create a spam classifier using SVM"
"Build a Random Forest classifier to predict personality types"
"Train a logistic regression model for binary classification"
"Classify this data: {'age': 35, 'income': 50000, 'score': 0.8}"
"Predict the category for this customer profile"
"Make predictions using data/inference_personality.csv"
"Classify the data in data/test_data.csv"
"What class would this sample belong to?"
"What classification models do I have available?"
"Show me my trained models"
"List all my classifiers"
"Which model has the best accuracy?"
"Explain the difference between precision and recall"
"When should I use Random Forest vs SVM for classification?"
"How do I interpret a confusion matrix?"
"What is the difference between binary and multi-class classification?"
You can also use the system programmatically:
from agents import root_agent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
# Initialize the agent
session_service = InMemorySessionService()
runner = Runner(agent=root_agent, session_service=session_service)
# Train a classification model
response = runner.run("Train a Random Forest classifier")
# Make classification predictions
response = runner.run("Classify this test data")google-adk-experiment/
βββ π± app.py # Streamlit web interface
βββ π₯οΈ cli.py # Command line interface
βββ βοΈ config.py # Configuration management
βββ π requirements.txt # Python dependencies
βββ π README.md # This file
βββ
βββ π€ agents/
β βββ agent.py # Root ADK agent definition
β βββ prompt.py # Agent instructions/prompt
β βββ ml_tools.py # Classification operation tools
β βββ __init__.py
βββ
βββ π§ ml_models/
β βββ base_model.py # Abstract base model class
β βββ classification_models.py # Classification algorithms
β βββ model_manager.py # Model persistence & loading
β βββ __init__.py
βββ
βββ π οΈ utils/
β βββ data_utils.py # Data processing utilities
β βββ visualization_utils.py # Plotting and visualization
β βββ __init__.py
βββ
βββ πΎ models/ # Saved trained classification models
βββ π data/ # Training datasets
βββ π results/ # Generated visualizations
The system leverages Google ADK's powerful agent framework:
- Agent: Core classification ML assistant with comprehensive instructions (defined in
agents/agent.py) - Tools: Classification ML operations (train, predict, list models)
- Runner: Orchestrates agent execution and tool calls
- Session Management: Maintains conversation state and context
- User Input β Natural language query about classification
- Intent Analysis β Gemini 2.0 Flash determines classification intent
- Tool Selection β ADK routes to appropriate classification ML tools
- Execution β Train classification models, make predictions, or provide info
- Response β Formatted results with classification metrics and visualizations
- State Management β Remembers classification models and context
- Pluggable Models: Easy to add new classification algorithms
- Flexible Data: Supports multiple input formats for classification
- Rich Visualizations: Automatic confusion matrices and classification plots
- Extensible Tools: Simple to add new classification capabilities
- CSV files - Comma-separated values (for training and prediction)
- Excel files - .xlsx, .xls formats
- JSON files - Structured data
- JSON strings - For real-time predictions
- Synthetic data - Auto-generated for demos
- Saved models - Persistent .joblib files
- Metadata - JSON model information
- Visualizations - PNG plots and charts
- Metrics - Comprehensive performance data
The system automatically generates beautiful visualizations:
- Training and validation loss curves
- Cross-validation score distributions
- Learning curves over epochs
- Confusion matrices for classification
- ROC curves and precision-recall curves
- Classification reports with precision, recall, F1-score
- Feature importance rankings
- Data distribution histograms
- Correlation matrices
- Missing value patterns
- Class distribution analysis
- Side-by-side metric comparisons
- Performance benchmarking charts
- Algorithm comparison tables
- Store API keys in
.envfiles (never in code) - Use environment variables for sensitive configuration
- Validate all user inputs before processing
- Models are cached for fast repeated access
- Visualizations are generated asynchronously
- Large datasets are processed in chunks
- Modular architecture for easy extensions
- Comprehensive error handling and logging
- Type hints and documentation throughout
- Create a new model class inheriting from
BaseMLModel:
from ml_models.base_model import BaseMLModel
class MyCustomModel(BaseMLModel):
def __init__(self, model_name: str = "my_model"):
super().__init__(model_name, "classification")
def train(self, X, y, **kwargs):
# Implement training logic
pass
def predict(self, X):
# Implement prediction logic
pass- Register in
ModelManager:
self.model_registry = {
'classification': {
'my_model': MyCustomModel,
# ... existing models
}
}- Create a tool function in
agents/ml_tools.py:
def my_new_tool(tool_context: ToolContext, param: str) -> Dict[str, Any]:
"""
Description of what the tool does.
"""
# Implement tool logic
return {'status': 'success', 'result': 'data'}- Add to agent tools in
agents/agent.py:
root_agent = Agent(
# ... other parameters ...
tools=[
train_ml_model,
predict_with_model,
list_available_models,
list_available_datasets,
detect_target_column_from_query,
my_new_tool # Add your new tool
]
)Contributions are welcome! Areas for improvement:
- New Algorithms: Deep learning models, ensemble methods
- Data Sources: Database connectors, API integrations
- Visualizations: Interactive plots, 3D visualizations
- Deployment: Cloud deployment, containerization
- Testing: Unit tests, integration tests
This project is open source and available under the MIT License.
- Google ADK Team - For the amazing agent development framework
- Google AI - For Gemini 2.0 Flash model access
- Streamlit - For the beautiful web interface framework
- Scikit-learn - For comprehensive ML algorithms
- Matplotlib/Seaborn - For visualization capabilities
- Get your Google AI API key from Google AI Studio
- Set up the environment with the steps above
- Run the app with
streamlit run app.py - Start chatting with your AI ML assistant!
Example first query: "Train a classification model and show me the results"
Happy machine learning! π€β¨