Vllama is a comprehensive toolkit that simplifies working with vision models, machine learning workflows, and local LLMs. Whether you're preprocessing datasets, training models with AutoML, generating images with state-of-the-art diffusion models, or chatting with local language models directly in VS Code, Vllama makes it easy - locally or on cloud GPUs.
Vllama Docs Site: https://dayinfinity.github.io/Vllama/
- 🔧 Autonomous Data Preprocessing: Intelligent data cleaning, encoding, scaling, and feature selection
- 🏆 AutoML Training: Train and compare multiple ML models automatically with hyperparameter tuning
- 🎨 Image Generation: Generate images using pre-trained diffusion models (Stable Diffusion, SD-Turbo)
- 🎬 Video Generation: Create videos from text prompts using text-to-video models
- 📷 Object Detection: Run YOLO object detection on images and videos
- 🖼️ Image/Video to 3D: Generate 3D PLY models from images or videos (via Kaggle)
- 👁️ 3D Viewer: View 3D model files (PLY, GLB, OBJ, STL, FBX)
- 🌐 Translation: Translate text using local NLLB models
- 🤖 Local LLM Server: Run language models locally as REST API servers
- 💬 CLI Chat: Interactive chat with local LLMs directly from terminal
- 🔊 Text-to-Speech: Convert text to speech using local TTS engine
- 🎤 Speech-to-Text: Convert speech to text using local STT engine
- ☁️ Cloud GPU Integration: Seamlessly offload computation to Kaggle GPUs
- 📊 Rich Visualizations: Automatic generation of insights, correlations, and performance metrics
- 💾 Smart Output Management: Organized folder structure with logs, models, and visualizations
- 💬 Chat with Local LLMs: Direct integration with VS Code's native "Chat with AI" interface
- 🔌 Local-First: Connect to LLMs running on your machine (e.g.,
localhost:2513) - ⚡ Zero Configuration: Works seamlessly with locally hosted language models
- 🎯 Native Experience: Fully integrated into VS Code's chat panel
- 🔮 Future Ready: Built to support agentic tools and advanced features
git clone https://github.com/DayInfinity/Vllama.git
cd Vllamapip install -r requirements.txtpip install -e .Now you can use vllama from anywhere in your terminal!
The Vllama VS Code extension allows you to chat with local LLMs directly from VS Code's Chat interface.
- VS Code (latest version recommended)
- A locally running LLM server (e.g., on
localhost:2513)
- Download the Vllama extension from the VS Code Marketplace (or install from
.vsixfile) - Open VS Code
- Go to Extensions (Ctrl+Shift+X / Cmd+Shift+X)
- Search for "Vllama" or install the downloaded
.vsixfile - Reload VS Code
- Ensure your local LLM server is running on the configured port (default:
localhost:2513) - Open VS Code's Chat panel (View → Chat with AI)
- Select your local LLM model from the model dropdown
- Start chatting with your local language model!
Note: The extension integrates seamlessly with VS Code's native chat interface, providing a familiar experience while maintaining complete privacy with your local LLM.
Clean and prepare your data for machine learning:
vllama data --path dataset.csv --target price --test_size 0.2 --output_dir ./outputsWhat it does:
- Automatically detects column types (numerical/categorical)
- Handles missing values intelligently (KNN imputation, median/mode filling)
- Removes duplicates and handles outliers
- Encodes categorical variables (label encoding, one-hot encoding, frequency encoding)
- Scales features using RobustScaler
- Performs feature selection (removes zero-variance and highly correlated features)
- Generates visualizations (missing values heatmap, correlation matrix, etc.)
- Splits data into train/test sets
- Saves processed data as
train_data.csvandtest_data.csv
Parameters:
--path: Path to your dataset (supports CSV, Excel, JSON, Parquet)--target: Target column name (auto-detected if not specified)--test_sizeor-t: Test set proportion (default: 0.2)--output_diror-o: Output directory (default: current directory)
Output Structure:
output_folder_YYYYMMDD_HHMMSS/
├── train_data.csv
├── test_data.csv
├── processed_full_data.csv
├── preprocessing_log.json
├── preprocessing_log.txt
├── summary_report.json
├── transformation_metadata.json
└── visualizations/
├── 01_missing_initial.png
├── 02_dtypes.png
├── 03_corr_processed.png
├── 04_target_processed.png
└── 05_mi.png
Automatically train and compare multiple ML models:
vllama train --path ./outputs/output_folder_YYYYMMDD_HHMMSS --target priceWhat it does:
- Auto-detects task type (classification or regression)
- Trains multiple models with hyperparameter tuning:
- Classification: Logistic Regression, Random Forest, XGBoost, LightGBM, CatBoost, SVM, KNN, MLP, Naive Bayes
- Regression: Random Forest, XGBoost, LightGBM, CatBoost, SVR, KNN, MLP
- Uses RandomizedSearchCV for efficient hyperparameter optimization
- Evaluates models on test set with comprehensive metrics
- Generates visualizations (confusion matrices, ROC curves, prediction plots)
- Saves all models and creates a leaderboard
- Identifies and saves the best performing model
Parameters:
--pathor-p: Path to folder containingtrain_data.csvandtest_data.csv--targetor-t: Target column name
Output Structure:
results/
├── model_summary.csv # Leaderboard of all models
├── best_model.pkl # Best performing model
├── best_model.txt # Best model details
├── report.html # HTML report with all results
└── per_model/
├── RandomForest/
│ ├── RandomForest_best_model.pkl
│ ├── RandomForest_tuning_results.csv
│ ├── RandomForest_confusion_matrix.png
│ └── RandomForest_roc_curve.png
├── XGBoost/
└── ...
vllama show modelsLists all supported vision models with descriptions.
Pre-download model weights to cache:
vllama install stabilityai/sd-turboSingle Prompt Mode:
vllama run stabilityai/sd-turbo --prompt "A serene mountain landscape at sunset" --output_dir ./imagesInteractive Mode:
vllama run stabilityai/sd-turboThen enter prompts interactively. Type exit or quit to stop.
Parameters:
model: Model name (e.g.,stabilityai/sd-turbo)--promptor-p: Text prompt for image generation--output_diror-o: Directory to save generated images (default: current directory)--serviceor-s: Offload to cloud service (e.g.,kaggle)
Features:
- Automatic GPU/CPU detection
- Low VRAM optimization (for GPUs with ≤3GB VRAM)
- Memory-efficient attention (xformers)
- Attention slicing and VAE tiling for better performance
vllama run stabilityai/sd-turbo --service kaggle --prompt "A cyberpunk city at night"What it does:
- Creates a Kaggle kernel with GPU enabled
- Installs dependencies automatically
- Runs the model on Kaggle's GPU
- Downloads the generated image to your local machine
Autonomous data preprocessing and cleaning.
vllama data --path <dataset> --target <column> [--test_size <float>] [--output_dir <dir>]Examples:
# Basic usage with auto-detected target
vllama data --path sales_data.csv
# Specify target column and test size
vllama data --path housing.csv --target price --test_size 0.25
# Custom output directory
vllama data --path data.csv --target label -t 0.3 -o ./processed_dataAutoML model training with hyperparameter tuning.
vllama train --path <data_folder> --target <column>Examples:
# Train on preprocessed data
vllama train --path ./output_folder_20231124_143022 --target SalePrice
# Short form
vllama train -p ./data -t labelList all supported vision models.
vllama show modelsDownload and cache a model.
vllama install <model_name>Example:
vllama install stabilityai/sd-turboRun a vision model for image generation.
vllama run <model_name> [--prompt <text>] [--service <service>] [--output_dir <dir>]Examples:
# Single prompt
vllama run stabilityai/sd-turbo --prompt "A beautiful sunset"
# Interactive mode
vllama run stabilityai/sd-turbo
# Run on Kaggle GPU
vllama run stabilityai/sd-turbo --service kaggle --prompt "A dragon flying"
# Custom output directory
vllama run stabilityai/sd-turbo -p "A forest" -o ./my_imagesGenerate videos from text prompts.
vllama run_video <model_name> [--prompt <text>] [--service <service>] [--output_dir <dir>]Examples:
# Generate video locally
vllama run_video damo-vilab/text-to-video-ms-1.7b --prompt "A cat playing piano"
# Generate video on Kaggle GPU
vllama run_video damo-vilab/text-to-video-ms-1.7b --service kaggle --prompt "A sunset over ocean"
# Interactive mode
vllama run_video damo-vilab/text-to-video-ms-1.7bRun object detection on an image using YOLO.
vllama detect_image --path <image_path> [--url <image_url>] [--model <yolo_model>] [--output_dir <dir>]Examples:
vllama detect_image --path photo.jpg
vllama detect_image --url https://example.com/image.jpg -o ./outputs
vllama detect_image --path photo.jpg -m yolov8s.ptRun object detection on a video using YOLO.
vllama detect_video --path <video_path> [--model <yolo_model>] [--output_dir <dir>]Example:
vllama detect_video --path video.mp4 -o ./outputsGenerate 3D PLY model from an image (Kaggle GPU).
vllama image3d --path <image_path> [--url <image_url>] [--service kaggle] [--output_dir <dir>]Example:
vllama image3d --path photo.jpg --service kaggle -o ./outputsGenerate 3D PLY model from a video (Kaggle GPU).
vllama video3d --path <video_path> [--service kaggle] [--output_dir <dir>] [--frame_interval <n>]Example:
vllama video3d --path video.mp4 --service kaggle -o ./outputs -f 10View a 3D model file (PLY, GLB, OBJ, STL, FBX).
vllama view3d --path <model_path>Example:
vllama view3d --path model.plyList all installed/downloaded models.
vllama list modelsRemove a downloaded model from cache.
vllama uninstall <model_name>Example:
vllama uninstall stabilityai/sd-turboSend a prompt to an already running model session.
vllama post <prompt> [--output_dir <dir>]Example:
vllama post "A magical castle" --output_dir ./outputsStop the currently running model session.
vllama stopRun a local LLM as a REST API server.
vllama run_llm <model_name>What it does:
- Downloads and loads the specified HuggingFace LLM
- Starts a Flask server on
localhost:2513 - Provides a
/chatendpoint for conversation - Maintains conversation history
- Compatible with VS Code extension
Examples:
# Run Qwen model (default)
vllama run_llm Qwen/Qwen2.5-Coder-0.5B-Instruct
# Run Llama model
vllama run_llm meta-llama/Llama-2-7b-chat-hf
# Run any HuggingFace chat model
vllama run_llm microsoft/DialoGPT-mediumAPI Usage:
# Send message via curl
curl -X POST http://localhost:2513/chat \
-H "Content-Type: application/json" \
-d '{"message": "Hello, how are you?"}'Note: This is the server that the VS Code extension connects to by default.
Interactive chat with a local LLM via CLI.
vllama chat_llmWhat it does:
- Connects to a running LLM server (started with
run_llm) - Provides interactive chat interface in terminal
- Maintains conversation context
- Type
exitorquitto stop
Example:
# Terminal 1: Start LLM server
vllama run_llm Qwen/Qwen2.5-Coder-0.5B-Instruct
# Terminal 2: Start chat
vllama chat_llm
# You> Write a Python function to reverse a string
# Assistant> Here's a function to reverse a string...Convert text to speech using local TTS engine.
vllama tts --text <text>Examples:
# Speak text
vllama tts --text "Hello, this is a test of text to speech"
# Interactive mode (no --text flag)
vllama tts
# Enter text: Hello worldConvert speech to text using microphone input.
vllama sttWhat it does:
- Listens to microphone input
- Converts speech to text using Google Speech Recognition
- Prints transcribed text
Example:
vllama stt
# Listening... Speak now!
# [You speak: "Hello world"]
# Transcribed: Hello worldNote: Use --path to transcribe from an audio file, or run without it for microphone input. Use --language for better transcription.
Translate text using a local translation model (e.g. NLLB).
vllama translate --text <text> [--src <source_lang>] [--tgt <target_lang>] [--model <model_id>]Examples:
vllama translate --text "Hello world" --src en --tgt fr
vllama translate --text "Bonjour" --src fr --tgt enAuthenticate with a cloud GPU service.
vllama login --service <service> [--username <user>] [--key <api_key>]Examples:
# Login to Kaggle with credentials
vllama login --service kaggle --username myusername --key abc123xyz
# Use existing Kaggle credentials from ~/.kaggle/kaggle.json
vllama login --service kaggleInitialize a GPU session on a cloud service.
vllama init gpu --service <service>Example:
vllama init gpu --service kaggleRemove cloud service credentials.
vllama logout# 1. Preprocess data
vllama data --path raw_data.csv --target price
# 2. Train models (use the output folder from step 1)
vllama train --path ./output_folder_20231124_143022 --target price
# 3. Review results in the results/ folder# 1. Install model (optional, first-time only)
vllama install stabilityai/sd-turbo
# 2. Generate images interactively
vllama run stabilityai/sd-turbo
# Enter prompts:
# Prompt> A serene lake with mountains
# Prompt> A futuristic city
# Prompt> exit# 1. Login to Kaggle
vllama login --service kaggle --username myuser --key myapikey
# 2. Generate image on Kaggle GPU
vllama run stabilityai/sd-turbo --service kaggle --prompt "A magical forest"
# Image will be downloaded automatically# 1. Start local LLM server
vllama run_llm Qwen/Qwen2.5-Coder-0.5B-Instruct
# 2. In another terminal, start CLI chat
vllama chat_llm
# 3. Chat interactively
# You> Write a function to calculate fibonacci
# Assistant> Here's a function...# 1. Start Vllama LLM server
vllama run_llm Qwen/Qwen2.5-Coder-0.5B-Instruct
# 2. Open VS Code with Vllama extension installed
# 3. Open Chat with AI panel (View → Chat with AI)
# 4. Select your local model and start chatting!# 1. Generate video locally
vllama run_video damo-vilab/text-to-video-ms-1.7b --prompt "A cat playing piano"
# 2. Or use Kaggle GPU for faster processing
vllama run_video damo-vilab/text-to-video-ms-1.7b --service kaggle --prompt "A sunset"
Logs:
preprocessing_log.json: Detailed JSON log of all preprocessing stepspreprocessing_log.txt: Human-readable text logsummary_report.json: Summary statistics and metadata
Data Files:
train_data.csv: Training dataset (80% by default)test_data.csv: Testing dataset (20% by default)processed_full_data.csv: Complete processed datasettransformation_metadata.json: Encoders and scalers metadata for future use
Visualizations:
- Missing values heatmap
- Data types distribution
- Correlation matrix (top 20 features)
- Target distribution
- Mutual information scores
Model Files:
best_model.pkl: Best performing model (can be loaded with joblib)model_summary.csv: Comparison of all trained modelsreport.html: Interactive HTML report
Per-Model Outputs:
{model}_best_model.pkl: Saved model{model}_tuning_results.csv: Hyperparameter search results{model}_confusion_matrix.png: Confusion matrix (classification){model}_roc_curve.png: ROC curve (binary classification){model}_pred_vs_true.png: Scatter plot (regression)
Generated images are saved as:
vllama_output_{timestamp}.png # Local generation
vllama_kaggle_{timestamp}.png # Kaggle generation
Create a .env file for configuration:
# Kaggle API Credentials
KAGGLE_USERNAME=your_username
KAGGLE_KEY=your_api_key
# Model Cache Directory (optional)
HF_HOME=/path/to/cache
# Hugging Face Access Token (for gated models)
HF_TOKEN=your_huggingface_tokenVllama automatically optimizes for your GPU:
- High VRAM (>3GB): Uses float16, full resolution (512x512), more inference steps
- Low VRAM (≤3GB): Uses float32, reduced steps, memory-efficient attention
- CPU: Falls back to CPU inference (slower but works)
- 📚 Documentation: Full documentation overhaul—README, CHANGELOG, and SECURITY aligned with current features
- 🏷️ Version: Bumped to 1.10.0; all version references and badges updated
- 📖 Command reference: Added missing commands—
detect_image,detect_video,image3d,video3d,view3d,translate - ✏️ Minor fixes: Typo fix in CLI help text (
woth→with)
- 📦 Updated dependencies and PortAudio support on macOS
- 🆚 VS Code Extension: Added support for chatting with local LLMs directly from VS Code
- 📄 License Change: Migrated from GPL-3.0 to Apache-2.0 for greater flexibility
- 📚 Documentation: Comprehensive README updates with all features and workflows
- 🤝 Open Source: Prepared project for public open source release
- 🔒 Security: Enhanced security documentation and best practices
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
Please read our Code of Conduct before contributing.
This project is licensed under the Apache License 2.0.
Copyright 2025 Gopu Manvith
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Issue: "Kaggle API credentials not found"
# Solution: Set up Kaggle credentials
vllama login --service kaggle --username YOUR_USERNAME --key YOUR_API_KEYIssue: "CUDA out of memory"
# Solution: The tool automatically handles low VRAM, but you can also:
# 1. Close other GPU applications
# 2. Use CPU mode (automatic fallback)
# 3. Use Kaggle GPU instead
vllama run model --service kaggle --prompt "your prompt"Issue: "Target column not found"
# Solution: Specify the target column explicitly
vllama data --path data.csv --target your_column_nameIssue: "VS Code extension can't connect to local LLM"
# Solution: Ensure your LLM server is running
# 1. Check that the server is running on the correct port (default: localhost:2513)
# 2. Verify firewall settings allow local connections
# 3. Check VS Code extension settings for the correct endpoint- Documentation: GitHub Repository
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: manvithgopu1394@gmail.com
Built with:
- PyTorch - Deep learning framework
- Hugging Face Diffusers - State-of-the-art diffusion models
- Scikit-learn - Machine learning library
- XGBoost, LightGBM, CatBoost - Gradient boosting frameworks
- Kaggle API - Cloud GPU integration
- Flask - Web framework for API endpoints
- VS Code Extension API - VS Code extension development
- Support for more vision models (DALL-E, Midjourney-style models)
- Advanced agentic tools for VS Code extension
- Web UI for model training and inference
- Multi-GPU support for distributed training
- Integration with more cloud GPU providers
- Real-time model fine-tuning capabilities
- Local image-to-3D and video-to-3D (in addition to Kaggle)
- Enhanced chat capabilities with RAG (Retrieval-Augmented Generation)
- Build a comprehensive AI toolkit that works seamlessly across local and cloud environments
- Enable developers to easily integrate state-of-the-art AI models into their workflows
- Create a vibrant community of contributors and users
- Support the latest research in generative AI and machine learning
If you find Vllama useful, please consider giving it a star on GitHub! It helps others discover the project.
Made with ❤️ by Gopu Manvith