DataMind AI is an open-source project dedicated to building an intelligent, dynamic dashboard for IoT data analysis. It leverages Generative AI to automatically summarize datasets, detect anomalies, predict machine failures, and generate beautiful, interactive visualizations.
We aim to solve the problem of visualizing and understanding large IoT datasets without manual configuration. By combining high-performance data processing (Pandas/Scikit-Learn) with the contextual intelligence of LLMs (Google Gemini), DataMind AI transforms raw CSV/Excel sheets into actionable insights.
The project follows a modern decoupled architecture:
- Framework: FastAPI (Python)
- Data Processing: Pandas & Scikit-Learn
- Hybrid Approach: We don't send massive datasets to the LLM. Instead, we extract metadata and statistical summaries locally.
- Logic: The LLM acts as the "Brain" to define aggregation rules and anomaly detection parameters, while the Python backend acts as the "Muscle" to execute these rules on the full dataset.
- AI Integration: Google Gemini (via
google-generativeai) - Database: MongoDB (for storing file metadata and results)
- Framework: React + Vite
- Styling: TailwindCSS
- Visualization: Recharts (Dynamic rendering based on JSON config)
- State Management: Zustand
- Universal Support: Drag-and-drop support for CSV, Excel (
.xlsx), and JSON files. - Large File Handling: Optimized to handle gigabyte-sized datasets by processing data locally using Pandas/Polars and only sending metadata to the cloud.
- Contextual Understanding: Gemini AI analyzes your column headers and data types to understand what your data represents (e.g., "This looks like IoT sensor data").
- Auto-Rule Generation: Instead of generic charts, the AI generates specific Pandas aggregation rules (e.g., "Group by
DeviceIDand calculate averageTemperature") to reveal hidden patterns.
- Statistical Rigor: Uses Isolation Forests and Z-Score analysis to mathematically identify outliers.
- Context-Aware Alerts: The AI interprets why a data point is an anomaly (e.g., "Pressure dropped by 40% in 5 minutes") and suggests potential root causes.
- Failure Prediction: Aims to identify trendlines that historically precede machine failures.
- Zero-Config: You don't build charts; the AI builds them for you.
- Interactive UI: Fully responsive Recharts visualizations that support zooming, panning, and tooltips.
Found a bug? Want to request a new feature?
- Check the TODO.md to see if it's already planned.
- Open a GitHub Issue with a clear title and description.
- We welcome detailed bug reports!
This is a community-driven project. check TODO.md to see the progress and open tasks.
- Python 3.10+
- Node.js 18+
- MongoDB
- Google Gemini API Key
-
Navigate to the core directory:
cd core -
Install dependencies:
poetry install
-
Configure Environment Variables:
- Copy the example environment file:
cp .env.example .env
- Open
.envand add your keys:MONGO_URI: Your MongoDB connection string.GEMINI_API_KEY: Required. Get your free API key from Google AI Studio.
⚠️ Note: Never commit your.envfile or API keys to GitHub!
- Copy the example environment file:
-
Run the Server:
poetry run uvicorn app.main:app --reload
cd ui
npm install
npm run devMIT