Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 53 additions & 48 deletions PLAN.md
Original file line number Diff line number Diff line change
@@ -1,81 +1,86 @@
# PLAN: MES Core — Week 1 (DB Schema + Alembic Migrations)
# PLAN: MES Core — Week 2 (Modbus Machine State Reader)

**Branch:** `feat/mes-week1-db-schema`
**Issue:** Mikecranesync/MIRA#319
**Branch:** `feat/mes-week2-state-reader`
**Issue:** Mikecranesync/MIRA#320
**PRD:** `docs/PRD-MES-CORE.md`
**Date:** 2026-04-15
**Depends on:** Week 1 (feat/mes-week1-db-schema) merged

---

## Objective

Stand up the `mes_core` PostgreSQL schema — the foundational data layer for Work Orders, OEE, Machine States, and Downtime Tracking. All subsequent MES weeks depend on this being clean and stable.
Build the machine state reader: a background poller that reads the plc-modbus HTTP API every 5 seconds per configured line, detects state transitions (RUNNING/DOWN/IDLE/OFFLINE), writes them to `machine_states`, and exposes `GET /api/mes/lines` and `GET /api/mes/lines/{id}/state` REST endpoints.

## Affected Files

**New:**
- `services/mes/requirements.txt`
- `services/mes/backend/__init__.py`
- `services/mes/backend/config.py`
- `services/mes/backend/database.py`
- `services/mes/backend/main.py`
- `services/mes/backend/models/__init__.py`
- `services/mes/backend/models/db_models.py`
- `services/mes/backend/models/mes_models.py`
- `services/mes/backend/routes/__init__.py`
- `services/mes/backend/routes/health.py`
- `services/mes/alembic.ini`
- `services/mes/alembic/env.py`
- `services/mes/alembic/script.py.mako`
- `services/mes/alembic/versions/0001_initial_mes_schema.py`
- `services/mes/tests/__init__.py`
- `services/mes/tests/conftest.py`
- `services/mes/tests/test_schema.py`
- `infra/scada/mes.Dockerfile`
- `services/mes/backend/services/__init__.py`
- `services/mes/backend/services/plc_client.py` — async HTTP client wrapping plc-modbus
- `services/mes/backend/services/state_machine.py` — pure state detection from IO snapshot
- `services/mes/backend/services/state_poller.py` — asyncio background poll loop
- `services/mes/backend/routes/lines.py` — GET /api/mes/lines, GET /lines/{id}/state
- `services/mes/tests/test_machine_states.py` — 10 unit tests, all mocked

**Modified:**
- `docker-compose.yml` — add `factorylm-mes-db` (Postgres) and `factorylm-mes` containers
- `services/mes/requirements.txt` — add httpx
- `services/mes/backend/config.py` — add plc_modbus_url setting
- `services/mes/backend/main.py` — wire poller into lifespan, add lines router
- `docker-compose.yml` — add PLC_MODBUS_URL env to mes container

## Approach

1. Introduce SQLAlchemy 2.x + Alembic into `services/mes/` — first time in repo
2. Follow existing FastAPI service pattern from `services/plc-modbus/`
3. DB schema: 7 tables (`lines`, `products`, `work_orders`, `schedules`, `downtime_reasons`, `machine_states`, `oee_snapshots`)
4. Seed `downtime_reasons` (14 codes) and `lines` (2 lines) in initial migration
5. Postgres 16 via Docker — `factorylm-mes-db` container on port 5433 (avoids conflict)
6. FastAPI skeleton with `/api/health` only — full routes come in Week 2+
1. `plc_client.py` — thin async wrapper around `GET /api/plc/io` (httpx). Raises `PLCOfflineError` on timeout/connection failure so caller can set OFFLINE state.
2. `state_machine.py` — pure function `detect_state(io_data)` → `(MachineStateEnum, reason_code | None)`. Derived from `VFDStatus` and `ErrorCode` registers. No DB or network calls — fully testable without mocks.
3. `state_poller.py` — asyncio task, one iteration per line every 5s. Maintains in-memory cache to avoid DB reads on every tick. Writes to `machine_states` only on transition.
4. `lines.py` routes — two endpoints: list all lines (from DB), get current state (from in-memory cache + last DB row).
5. `main.py` lifespan — start poller task on startup, cancel on shutdown.

State transition write: close open row (`ended_at = NOW()`), insert new row.

## State Machine

```
IO: VFDStatus=1, ErrorCode=0 → RUNNING
IO: VFDStatus=2 OR ErrorCode>0 → DOWN (reason_code from ErrorCode map)
IO: VFDStatus=0, ErrorCode=0 → IDLE
HTTP failure / timeout → OFFLINE
```

## ErrorCode → reason_code map

```python
{1: "OVERLOAD", 2: "OVERHEAT", 3: "SENSOR_FAIL", 4: "JAM", 7: "E_STOP"}
```

## Risks

- No Alembic precedent in repo — introducing fresh, so migration is the only baseline
- Python 3.9 system — using `Optional[X]` not `X | None`
- Register map divergence (plc-modbus CLAUDE.md vs main CLAUDE.md) — MES uses main CLAUDE.md register map (authoritative)
- plc-modbus in mock mode returns VFDStatus=0 at rest — poller sees IDLE immediately (expected)
- Multiple lines share one plc-modbus service currently — same io_data, different `line_id` rows

## Rollback

```bash
git checkout main
docker compose down factorylm-mes-db factorylm-mes
git checkout feat/mes-week1-db-schema
```

## Verification Steps

```bash
# Start DB
docker compose up factorylm-mes-db -d

# Run migration
cd services/mes
DATABASE_URL="postgresql://mes:meslocal@localhost:5433/mes_core" alembic upgrade head

# Run schema tests
pytest services/mes/tests/test_schema.py -v

# Health check
docker compose up factorylm-mes -d
curl localhost:8300/api/health
# Unit tests (no docker needed)
cd services/mes && pytest tests/test_machine_states.py -v

# Integration: start stack, check state endpoint
docker compose up mes-db mes plc-modbus -d
curl localhost:8300/api/mes/lines
curl localhost:8300/api/mes/lines/<id>/state

# Inject a fault and verify DB transition
curl -X POST localhost:8001/api/plc/mock/fault -H "Content-Type: application/json" -d '{"fault_type":"jam"}'
sleep 8
curl localhost:8300/api/mes/lines/<id>/state # should show DOWN / JAM
```

## Note on Active Focus Window

The main `CLAUDE.md` declares a Revenue Priority focus on V1 Telegram bot. This MES work has been explicitly requested by Mike (2026-04-15 session) as a parallel track. Proceeding with explicit authorization.
Explicitly authorized by Mike (2026-04-15 session).
5 changes: 4 additions & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -106,10 +106,13 @@ services:
- "8300:8300"
environment:
FACTORYLM_DATABASE_URL: "postgresql://mes:meslocal@mes-db:5432/mes_core"
FACTORYLM_PLC_USE_MOCK: "true"
FACTORYLM_PLC_MODBUS_URL: "http://plc-modbus:8001"
FACTORYLM_PLC_USE_MOCK: "false"
depends_on:
mes-db:
condition: service_healthy
plc-modbus:
condition: service_healthy
healthcheck:
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8300/api/health')"]
interval: 10s
Expand Down
7 changes: 6 additions & 1 deletion services/mes/backend/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,13 @@ class Settings(BaseSettings):
# Format: postgresql://user:password@host:port/dbname
database_url: str = "postgresql://mes:meslocal@localhost:5434/mes_core"

# PLC defaults (overridden per-line from DB)
# plc-modbus service URL — MES calls this over HTTP (never raw Modbus TCP)
plc_modbus_url: str = "http://plc-modbus:8001"

# Polling interval in seconds (default 5, set lower in tests)
plc_poll_interval_sec: int = 5

# Set True to skip poller startup (useful in unit tests)
plc_use_mock: bool = False


Expand Down
36 changes: 32 additions & 4 deletions services/mes/backend/main.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
"""FactoryLM MES API — FastAPI entry point.

Week 1: /health only.
Week 2+: lines, work_orders, downtime, oee routes added here.
Lifespan:
startup → seed state cache, launch background state poller
shutdown → signal poller to stop cleanly

Routes (cumulative by week):
Week 1: /api/health
Week 2: /api/mes/lines, /api/mes/lines/{id}/state
"""

import asyncio
import logging
from contextlib import asynccontextmanager

Expand All @@ -12,16 +18,37 @@

from backend.config import settings
from backend.routes.health import router as health_router
from backend.routes.lines import router as lines_router
from backend.services import state_poller

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


@asynccontextmanager
async def lifespan(app: FastAPI):
logger.info("MES service starting — DB: %s", settings.database_url.split("@")[-1])
db_host = settings.database_url.split("@")[-1]
logger.info("MES service starting — DB: %s PLC: %s", db_host, settings.plc_modbus_url)

poller_task = None
if not settings.plc_use_mock:
poller_task = asyncio.create_task(
state_poller.run(poll_interval_sec=settings.plc_poll_interval_sec),
name="state_poller",
)
logger.info("State poller started (interval=%ds)", settings.plc_poll_interval_sec)
else:
logger.info("PLC mock mode — state poller disabled")

yield
logger.info("MES service stopping")

logger.info("MES service shutting down")
if poller_task:
state_poller.stop()
try:
await asyncio.wait_for(poller_task, timeout=8.0)
except asyncio.TimeoutError:
poller_task.cancel()


app = FastAPI(
Expand All @@ -39,6 +66,7 @@ async def lifespan(app: FastAPI):
)

app.include_router(health_router, prefix=settings.api_prefix)
app.include_router(lines_router, prefix=settings.api_prefix)


if __name__ == "__main__":
Expand Down
107 changes: 107 additions & 0 deletions services/mes/backend/routes/lines.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
"""Lines routes — GET /api/mes/lines and GET /api/mes/lines/{id}/state.

Week 2 endpoints only. Work order and OEE endpoints added in later weeks.
"""

import logging
from typing import Optional
from datetime import datetime

from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
from sqlalchemy.orm import Session

from backend.database import get_db
from backend.models.db_models import Line, MachineState, MachineStateEnum
from backend.models.mes_models import LineResponse
from backend.services import state_poller

logger = logging.getLogger(__name__)

router = APIRouter(prefix="/mes", tags=["mes"])


# ── Response models ───────────────────────────────────────────────────────────

class LineStateResponse(BaseModel):
line_id: str
line_name: str
state: str
reason_code: Optional[str] = None
since: Optional[datetime] = None # when this state started
source: str # "cache" | "db" | "unknown"


# ── Endpoints ─────────────────────────────────────────────────────────────────

@router.get("/lines", response_model=list[LineResponse])
def list_lines(db: Session = Depends(get_db)):
"""Return all configured production lines."""
lines = db.query(Line).order_by(Line.name).all()
return [
LineResponse(
id=str(line.id),
name=line.name,
isa95_path=line.isa95_path,
plc_host=line.plc_host,
plc_port=line.plc_port,
description=line.description,
)
for line in lines
]


@router.get("/lines/{line_id}/state", response_model=LineStateResponse)
def get_line_state(line_id: str, db: Session = Depends(get_db)):
"""Return the current machine state for a line.

Checks the in-memory cache first (no DB hit on the hot path).
Falls back to the most recent open DB row if the cache is cold
(e.g. service just restarted but poller hasn't ticked yet).
"""
line = db.query(Line).filter(Line.id == line_id).first()
if not line:
raise HTTPException(status_code=404, detail=f"Line {line_id} not found")

# Try in-memory cache first
cached = state_poller.get_cached_state(line_id)
if cached is not None:
# Get the `since` time from the latest open DB row (non-blocking, fast)
open_row = (
db.query(MachineState)
.filter(MachineState.line_id == line_id, MachineState.ended_at.is_(None))
.order_by(MachineState.started_at.desc())
.first()
)
return LineStateResponse(
line_id=line_id,
line_name=line.name,
state=cached.value,
reason_code=open_row.reason_code if open_row else None,
since=open_row.started_at if open_row else None,
source="cache",
)

# Cache cold — fall back to DB
open_row = (
db.query(MachineState)
.filter(MachineState.line_id == line_id, MachineState.ended_at.is_(None))
.order_by(MachineState.started_at.desc())
.first()
)
if open_row:
return LineStateResponse(
line_id=line_id,
line_name=line.name,
state=open_row.state.value,
reason_code=open_row.reason_code,
since=open_row.started_at,
source="db",
)

return LineStateResponse(
line_id=line_id,
line_name=line.name,
state=MachineStateEnum.OFFLINE.value,
source="unknown",
)
Empty file.
Loading
Loading