Full-stack manufacturing analytics built in phases.
Start with a dashboard. Add a REST API. Deploy a Telegram bot. Enable natural language queries.
Each phase is independent — stop at any phase and still have a fully working system.
A manufacturing OEE analytics system that grows phase by phase:
Phase 1–6 → Streamlit dashboard (browser, localhost)
Phase 7 → FastAPI backend (REST API, localhost:8000)
Phase 8 → Telegram bot (button menus, mobile)
Phase 9 → Natural language (free-text queries, fully local AI)
Phase 10 → Live SCADA (real-time, coming soon)
Every phase reads from the same 10 CSV files. You can use the sample data included in this repository, or replace it with your own plant data — same column names, real numbers.
OEE = Availability × Performance × Quality
Availability = Operating Time / Planned Time
Performance = (Actual Qty × Ideal Cycle Time) / Operating Time
Quality = Good Qty / Actual Qty
┌──────────────────────────────────────────────────────────┐
│ FINISHED GOODS WAREHOUSE │
├───────────────┬───────────────┬──────────────────────────┤
│ Assembly │ Assembly │ Assembly │
│ Alpha │ Beta │ Gamma │
├───────────────┼───────────────┼──────────────────────────┤
│ Painting │ Painting │ Painting │
├───────────────┼───────────────┼──────────────────────────┤
│ Machining │ Machining │ Machining │
├───────────────┴───────────────┴──────────────────────────┤
│ RAW MATERIALS WAREHOUSE │
└──────────────────────────────────────────────────────────┘
VSM Alpha VSM Beta VSM Gamma
35 machines · 3 VSMs · 3 areas · 3 shifts · 3 years of history (2022–2024)
OEE-Dashboard/
│
├── app.py ← Phases 1–6 Streamlit dashboard
├── oee_data_dictionary.md ← Phase 0 Full schema for all 10 tables
├── BOT_GUIDE.md ← Phases 8–9 Telegram bot user guide
├── requirements.txt ← All phases Python dependencies
├── .env.example ← Phase 8 Token template (copy → .env)
│
├── .streamlit/
│ └── config.toml ← Phase 1 Dark theme
│
├── data/ ← Phase 0 10 CSV files (sample or real)
│ ├── Ordenes_Produccion_OEE.csv
│ ├── Paros_Eventos_OEE.csv
│ ├── Hourly_Snapshots_OEE.csv
│ ├── Maquinas_OEE.csv
│ ├── Productos_OEE.csv
│ ├── Catalogo_Paros_OEE.csv
│ ├── SCADA_Signals_OEE.csv
│ ├── Operator_Notes_OEE.csv
│ ├── Work_Orders_OEE.csv
│ └── Calendario_OEE.csv
│
├── api/ ← Phase 7 FastAPI REST backend
│ ├── main.py
│ ├── data_loader.py
│ ├── schemas.py
│ └── routes/
│ ├── availability.py
│ ├── performance.py
│ └── quality.py
│
└── bot/ ← Phases 8–9 Telegram bot
├── bot.py
├── handlers.py
├── keyboards.py
├── queries.py
├── formatters.py
├── display.py
└── nl_handler.py
git clone https://github.com/your-username/OEE-Dashboard.git
cd OEE-Dashboard
pip install -r requirements.txtPython 3.10 or higher required.
Before running anything, you need the 10 CSV files in data/.
The data/ folder already contains 3 years of synthetic production data for a 35-machine plant. You can run every phase immediately without any configuration.
Replace the CSV files in data/ with exports from your MES / CMMS / SCADA.
Your files must match the column names and data types described in oee_data_dictionary.md.
Minimum viable dataset — if your plant only has a shift report:
| Column | Source |
|---|---|
Date, Shift, MachineID |
Shift log |
PlannedTime_min, OperatingTime_min |
Downtime tracking |
PlannedQty, ActualQty, GoodQty |
Production count |
The dashboard computes Availability, Performance, Quality, and OEE from these columns at load time.
Typical source systems:
| CSV file | System |
|---|---|
Ordenes_Produccion_OEE |
MES (Ignition, Wonderware, Opcenter) |
Paros_Eventos_OEE |
MES downtime module or CMMS |
Maquinas_OEE |
ERP equipment master (SAP PM, Maximo) |
SCADA_Signals_OEE |
SCADA historian (FactoryTalk, WinCC) |
Work_Orders_OEE |
CMMS corrective/preventive WOs |
What you get: A 6-tab analytics dashboard with 7 ML models, running in your browser.
streamlit run app.pyOpens at http://localhost:8501
| Tab | Phase | What it shows |
|---|---|---|
| 📊 Overview | 1 | Fleet OEE cards, VSM breakdown, live shift status |
| 📡 Operations | 2 | Gantt timeline, SCADA readings, operator notes, work orders |
| 📅 History | 3 | 3-year trend, downtime Pareto, shift/year comparison |
| 🤖 AI · Availability | 4 | Model A: failure prediction · Model B: MTBF/MTTR reliability |
| ⚡ AI · Performance | 5 | Model C: temporal evolution · Model D: weekly forecast |
| ⭐ AI · Quality | 6 | Model E: FPY · Model F: Speed vs Quality · Model G: defect risk |
| Model | Type | Predicts |
|---|---|---|
| A | RandomForest Classifier | Next failure event type |
| B | Recency-weighted MTBF (half-life 180d) | Time between failures + repair time |
| C | Historical decomposition | Performance evolution by VSM × Year × Area |
| D | RandomForest Regressor | Expected units per day (next 2 weeks) |
| E | Aggregation | First Pass Yield by VSM |
| F | Scatter quadrant analysis | Speed vs Quality tradeoff per machine |
| G | RandomForest Regressor | Defect rate by Shift × Product × Machine |
Models train automatically on startup from the CSVs — no separate training step needed.
What you get: All 7 ML models exposed as REST endpoints. Integrate with any external system — ERP, Power BI, custom apps.
This phase is optional. The dashboard (Phases 1–6) and the bot (Phases 8–9) work without it.
uvicorn api.main:app --reload --port 8000API trains all models on startup (~30 sec), then serves requests at http://localhost:8000.
| URL | Description |
|---|---|
http://localhost:8000/docs |
Swagger UI — try every endpoint live |
http://localhost:8000/redoc |
ReDoc reference |
http://localhost:8000/health |
Liveness check |
# Predict next failure type for a machine
curl -X POST http://localhost:8000/api/v1/availability/predict \
-H "Content-Type: application/json" \
-d '{"machine_id":"LINE_03","failures_30d":3,"days_since_pm":45}'
# MTBF fleet table filtered by VSM
curl http://localhost:8000/api/v1/availability/mtbf?vsm=Beta
# Defect risk for a shift × product combination
curl -X POST http://localhost:8000/api/v1/quality/risk \
-H "Content-Type: application/json" \
-d '{"machine_id":"LINE_20","shift":"Night","product_id":"PROD_005","day_of_week":0}'What you get: All 13 analytics queries accessible from your phone via button menus. Plus an automatic daily KPI report at 08:00.
1. Create a Telegram bot
Open Telegram → search @BotFather → /newbot → follow prompts → copy the token.
2. Get your Telegram user ID
Search @userinfobot on Telegram → it replies with your numeric ID.
3. Configure environment variables
cp .env.example .envEdit .env:
TELEGRAM_TOKEN=your_token_from_botfather
TELEGRAM_CHAT_ID=your_numeric_user_id
.envis in.gitignore— it will never be committed to GitHub.
4. Run the bot
python -m bot.botThe bot trains all 7 models on startup (~30 sec), then begins listening. Keep the terminal open.
| Category | Queries available |
|---|---|
| 📊 Overview | Plant Snapshot · Critical Machines |
| 🔴 Availability | Predict Next Failure · MTBF Fleet · MTBF Single Machine |
| ⚡ Performance | VSM × Year Evolution · Area Monthly Deviation · Weekly Forecast |
| ⭐ Quality | FPY by VSM · Speed vs Quality · Defect Risk Predictor |
| 📋 Automatic | Daily KPI Report at 08:00 |
Period options for every descriptive query:
Last 7 days · Last 30 days · Last 3 months · Last 6 months · Year to date · 2024 · 2023 · 2022
Commands:
| Command | Description |
|---|---|
/start |
Open main menu |
/menu |
Return to main menu from anywhere |
/status |
Quick plant snapshot (last 30 days) |
/alerts |
Toggle the daily 08:00 KPI report on/off |
See
BOT_GUIDE.mdfor the complete query reference with sample outputs.
What you get: Type questions in plain Spanish or English — no menus, no buttons. The bot understands the question, calls the right query, and replies with real data.
Powered by Ollama + llama3.2 — runs 100% locally. No API keys, no internet connection required, no cost, data never leaves your machine.
1. Install Ollama
Download from ollama.com/download and install. Ollama starts automatically in the background.
2. Pull the model (one-time, ~2 GB)
ollama pull llama3.23. Run the bot — same command as Phase 8
python -m bot.botNatural language is active immediately alongside the button menus.
"¿Cómo estuvo el OEE el mes pasado?"
"Cuáles son las peores máquinas en 2024?"
"Show me the speed vs quality tradeoff for Beta"
"Dame el FPY del último trimestre"
"What's the MTBF risk fleet-wide?"
"Dame el reporte de hoy"
User types question
↓
Ollama (llama3.2) identifies which query to call + what parameters
↓
Bot executes the real query against the CSV data
↓
Ollama generates a one-sentence natural language summary
↓
User receives: summary + full formatted data + menu buttons
The three processes are completely independent — each runs in its own terminal:
| Terminal | Command | Port | Required for |
|---|---|---|---|
| 1 | streamlit run app.py |
8501 | Dashboard (F1–F6) |
| 2 | uvicorn api.main:app --reload |
8000 | REST API (F7) |
| 3 | python -m bot.bot |
none | Telegram bot (F8–F9) |
You can run any combination. They do not interfere with each other.
Why a single app.py?
All ML models train via @st.cache_data on first load — no separate pipeline. For production scale (>1M rows), split into a src/ ETL layer with pre-trained model storage.
Why RandomForest over LSTM or XGBoost?
With 3 years × 35 machines × 3 shifts ≈ 100K rows, RandomForest gives interpretable, stable results without overfitting risk. Feature importance is human-readable. LSTM becomes relevant at sensor-level (1-second) granularity with continuous streaming data.
Why recency-weighted MTBF?
A machine with 10 failures three years ago but zero in the last 6 months should not carry the same risk as one that just failed last week. Exponential decay (half-life = 180 days) assigns higher weight to recent events and exposes ESS (Effective Sample Size) as a confidence indicator.
Why Ollama over a cloud LLM?
Manufacturing KPIs are confidential. Running the language model locally means production data never leaves the plant network. Ollama with llama3.2 also has zero ongoing cost, no rate limits, and no dependency on external availability.
Why no database?
CSV files are the universal export format of every MES, CMMS, and SCADA system. No ETL, no schema migration, no DBA required. When row count exceeds ~5M, swap pd.read_csv for a DuckDB or PostgreSQL connector — the rest of the stack stays identical.
- Phase 1: OEE decomposition dashboard (Availability, Performance, Quality)
- Phase 2: Live operations view (Gantt, SCADA, shift notes, work orders)
- Phase 3: Historical analytics (3-year trends, Pareto, shift/year comparison)
- Phase 4: Predictive Availability (RandomForest classifier + recency-weighted MTBF)
- Phase 5: Predictive Performance (temporal evolution, pre-PM degradation, weekly forecast)
- Phase 6: Quality Intelligence (FPY by VSM, Speed vs Quality tradeoff, risk predictor)
- Phase 7: FastAPI backend — all 7 models as REST endpoints
- Phase 8: Telegram bot — 13 button-driven queries + daily KPI report
- Phase 9: Natural language interface — free-text queries via Ollama (local, zero cost)
- Phase 10: Real-time SCADA integration (MQTT / OPC-UA → live dashboard)
LozanoLsa · Turning Operations into Predictive Systems