Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
53a8e76
feat(api|front|db): create micro services system, implement docker co…
LEEDASILVA Apr 11, 2026
a430f2f
refactor(docker-compose): clean up
LEEDASILVA Apr 11, 2026
5780b5b
refactor(dockerfile): clean up
LEEDASILVA Apr 11, 2026
d4f5315
feat(api): add go mod for api + fix air.toml to have live update when…
LEEDASILVA Apr 12, 2026
bbe71ec
chore(front): fix rebsae conflicts
LEEDASILVA Apr 12, 2026
4081fd7
feat(api): add authentication + docs + healthcheck routes for the api
LEEDASILVA Apr 13, 2026
a354b8f
chore(docker): some cleanup
LEEDASILVA Apr 13, 2026
b45d8a7
refactor(front|docker): caddy should only be used for production + fi…
LEEDASILVA Apr 15, 2026
7f90346
chore(scripts): fix typo on filename
LEEDASILVA Apr 19, 2026
9e5c52a
refactor(services): update services versions
LEEDASILVA Apr 24, 2026
fe75238
chore(db): bump postgres version + fix typo for postgres initializati…
LEEDASILVA Apr 28, 2026
57796cd
chore(fmt&lint): use deno instead of biome
LEEDASILVA Apr 28, 2026
f745b45
chore(deps): pin to latest/major
HarryVasanth Apr 28, 2026
9013d9f
chore(services): remove migrate service and just let hasura migrate t…
LEEDASILVA Apr 29, 2026
96b8455
fix(db): replace UUIDs for every table except equipments & add permis…
LEEDASILVA Apr 29, 2026
193b1cb
fix(services): fix and improve from comments
LEEDASILVA Apr 30, 2026
9d35a34
chore(script): add mock data and script to apply it
LEEDASILVA Apr 30, 2026
601d837
fix(scripts): make it so that we can execute script from the root dir…
LEEDASILVA Apr 30, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -46,4 +46,8 @@ Thumbs.db
test-results/
playwright-report/
playwright/.cache/
verify_ui.py
verify_ui.py

# Air golang
tmp/
vendor/
314 changes: 312 additions & 2 deletions SYSTEM_DESIGN_README.md
Original file line number Diff line number Diff line change
Expand Up @@ -422,7 +422,6 @@ erDiagram
int purchase_cost_cents
int expected_lifespan_years
text status
text qr_code_id
text notes
}

Expand Down Expand Up @@ -570,6 +569,317 @@ erDiagram
users ||--o{ notifications_log : "receives"
```

---

## Information per table

## Tenancy anchor — companies

```mermaid
erDiagram
companies {
uuid id PK
text name
text slug
text industry
text plan
text timezone
jsonb branding
timestamptz created_at
}

users {
uuid id PK
uuid company_id FK
text name
text email
text phone
text role
uuid default_location_id FK
jsonb notification_prefs
timestamptz last_login_at
}
```

Every single table has company_id. This is the outermost security boundary. Hasura enforces it on every query via JWT claims — a misconfigured token literally cannot see another tenant's rows.

- `slug` deserves attention: it's the URL-safe identifier (opera-fix, tasca-do-porto) used in subdomains or path prefixes. Set it immutable after creation — changing it breaks bookmarked URLs.

- `branding` as JSONB is intentional. It holds { primaryColor, logoUrl, emailFrom } — things that vary per company but don't need their own table. JSONB is fine here because you never query inside it, you fetch the whole object.

- `timezone` at the company level is the fallback. Individual locations override it. Business logic never touches Date() directly — it always converts UTC → location timezone at the display layer.

## Configurable hierarchy — location_types + locations

```mermaid
erDiagram
location_types {
uuid id PK
uuid company_id FK
text name
text icon
int expected_depth
}

severity_levels {
uuid id PK
uuid company_id FK
text name
text color
int sla_hours
bool sms_alert
bool bypass_quiet_hours
int sort_order
}

equipment_categories {
uuid id PK
uuid company_id FK
text name
text icon
int default_pm_interval_days
text industry_hint
}

issue_categories {
uuid id PK
uuid company_id FK
text name
text icon
uuid default_severity_id FK
}
```

This is the most important design decision in the schema. A fixed 4-level hierarchy breaks the moment you add hotels or hospitals. The self-referencing locations table with location_types solves this cleanly.

- `path` (e.g. tasca-do-porto/kitchen) is a materialised column — computed on insert/update by a trigger or application code, stored as text. It makes subtree queries a single WHERE path LIKE 'tasca-do-porto%' rather than a recursive CTE on every request. The trade-off: you must keep path consistent on renames. This is acceptable because location renames are rare and can be handled with a transaction that updates all descendants.

- `depth` is a denormalised integer — redundant with the path, but useful for fast "give me all depth-0 nodes" queries without string parsing.

- `manager_id` is a FK to users. One location has one primary manager. If you need multiple managers per location later, that becomes a join table (location_managers). Don't over-engineer it now.

### Company-owned config tables

`severity_levels`, `issue_categories`, `equipment_categories` are all per-company. This is what makes the platform multi-vertical without hardcoding anything.
Key fields worth discussing:

- `severity_levels.sla_hours` — this drives the SLA deadline calculation on every new issue. sla_deadline = reported_at + INTERVAL 'X hours'. The SLA monitor cron reads this via a join, not from a hardcoded constant.

- `severity_levels.bypass_quiet_hours` — Critical alerts always go out, regardless of user notification preferences. This flag lives on the severity level, not on the notification, so it's configurable per company.

- `severity_levels.sms_alert` — SMS costs money. This flag prevents a medium-severity issue from triggering Twilio. Default: only Critical = true.

- `issue_categories.default_severity_id` — when a staff member selects "Safety Hazard", the severity pre-fills to Critical. This is a UX shortcut that reduces reporting friction. Users can override it.

- `equipment_categories.default_pm_interval_days` — the default preventive maintenance interval when a PM task is created for this category. A technician adding a PM task for a refrigerator gets 30 days pre-filled; they adjust as needed.

## The core asset — equipment

```mermaid
erDiagram

locations {
uuid id PK
uuid company_id FK
uuid parent_id FK
uuid location_type_id FK
uuid manager_id FK
text name
text path
int depth
text timezone
text address
}

equipment {
uuid id PK
uuid company_id FK
uuid location_id FK
uuid category_id FK
uuid parent_equipment_id FK
bool is_component
text name
text serial_number
text manufacturer
text model
date install_date
date warranty_expiry
int purchase_cost_cents
int expected_lifespan_years
text status
text notes
}

equipment_photos {
uuid id PK
uuid equipment_id FK
text storage_url
text caption
bool is_primary
timestamptz uploaded_at
}
```

The most critical table. Every analytic, every issue, every PM schedule traces back here.

- `company_id` is denormalised here (it's already reachable via location → company). This is intentional — it makes the Hasura row-level permission rule a single column check rather than a join. Worth the redundancy.

- `id`, will be the equipment id. It's unique and never reused, even after decommission. QR labels printed with this ID must never become ambiguous — if equipment is replaced, the old ID stays on the old record, the new unit gets a new ID.

- `purchase_cost_cents` and all monetary values are INTEGER (cents). Never DECIMAL or FLOAT for money. Floating-point arithmetic on financial values causes silent rounding errors that compound over time.

- `parent_equipment_id` is nullable. Phase 1: null for everything. Phase 2: populate for high-value components (a compressor inside a specific refrigeration unit). The column costs nothing empty, and adding it later would require a migration that touches every equipment row.

- `is_component` is a boolean flag that separates "staff would scan this" from "technician references this during repair". When is_component = true, the equipment doesn't get its own QR label and doesn't appear in the employee-facing reporting flow.

- `status` is an enum: active | under_repair | decommissioned. decommissioned is important — it preserves history without polluting active equipment lists. Never hard-delete equipment.

- `warranty_expiry` — the Go cron sends an alert 30 days before this date. Without this field, warranty claims get missed and repairs that should be free get paid for. This field pays for itself.

## Issue lifecycle — issues

```mermaid
erDiagram

issues {
uuid id PK
uuid company_id FK
uuid equipment_id FK
uuid location_id FK
uuid category_id FK
uuid severity_id FK
uuid reporter_id FK
uuid assigned_to FK
text status
text title
text description
timestamptz reported_at
timestamptz sla_deadline
timestamptz assigned_at
timestamptz resolved_at
timestamptz closed_at
text resolution_notes
}

issue_photos {
uuid id PK
uuid issue_id FK
text storage_url
text stage
uuid uploaded_by FK
timestamptz uploaded_at
}

issue_comments {
uuid id PK
uuid issue_id FK
uuid author_id FK
text body
timestamptz created_at
}
```

The most frequently written and read table. Index design is critical here.

- `location_id` is denormalised (reachable via `equipment` → `location`). Same reason as `company_id` on `equipment` — avoids a join on the hottest query path: "show all open issues for location X".

- `severity_id` is now a FK to severity_levels, not a hardcoded enum. This means a hospital can have a "30-minute" severity level that a restaurant doesn't. The severity is company-configurable.

- `sla_deadline` is computed at insert time: `reported_at` + `severity.sla_hours`. It's stored, not calculated on read, because the SLA monitor cron queries it with WHERE `sla_deadline < NOW() AND status NOT IN ('resolved', 'closed')`. A computed column would kill that index.

The timestamp chain — `reported_at` → `assigned_at` → `resolved_at` → `closed_at` — is what makes MTTR calculation possible. Every transition is recorded, not just the final state. This also enables "time to first assignment" as a separate metric, which reveals whether managers are slow to respond even if technicians are fast.

- `resolution_notes` lives on the issue, not on maintenance_actions. An issue can be resolved without a full maintenance action (e.g., the employee's report was incorrect — the equipment was fine). The notes here are the manager's closure summary.

## Repair documentation — maintenance_actions + parts_used

```mermaid
erDiagram

maintenance_actions {
uuid id PK
uuid issue_id FK
uuid technician_id FK
text action_description
text root_cause
text component_type
text component_name
int labor_minutes
timestamptz start_time
timestamptz end_time
}

parts_used {
uuid id PK
uuid maintenance_action_id FK
text part_name
text part_number
int quantity
int unit_cost_cents
text supplier
}
```

- `maintenance_actions` is the technician's work log. One issue can have multiple actions (a technician starts, orders parts, comes back to finish). This is why it's a separate table with its own PK, not fields on issues.

- `component_type` and component_name are the Phase 1 approach to sub-equipment tracking. Instead of a full sub-asset hierarchy, the technician notes "I replaced the compressor" as structured text. This is enough to aggregate WHERE component_name = 'compressor' across all repairs and answer "how many compressors have we replaced this year and at what cost?" — without the UX overhead of a full component tree.

- `labor_minutes` is derived from end_time - start_time but stored explicitly. Technicians sometimes forget to clock out and manually correct the duration. Storing it separately from the timestamps accommodates that without breaking the audit trail.

- `parts_used` is the financial goldmine. `part_name` + `part_number` + `unit_cost_cents` + `supplier` per line item gives you: total cost per repair, total cost per equipment over its lifetime, most-replaced parts across all locations, supplier price comparison. All of this falls out of simple aggregations on this table.

## Preventive maintenance — preventive_tasks + preventive_schedules

```mermaid
erDiagram

preventive_tasks {
uuid id PK
uuid company_id FK
uuid equipment_id FK
text title
text description
int frequency_days
text assigned_role
int estimated_minutes
bool is_active
}

preventive_schedules {
uuid id PK
uuid task_id FK
date due_date
timestamptz completed_at
uuid completed_by FK
text notes
text status
}
```

- `preventive_tasks` is the template. `preventive_schedules` is the generated instance.

- `frequency_days` drives schedule generation. The Go daily cron runs: "for every active task, if no pending schedule exists within the next 30 days, create one with `due_date` = `last_completion` + `frequency_days`." This is idempotent — running it twice doesn't double-create schedules.

- `assigned_role` is a string (employee | technician), not a FK to users. The task is assigned to a role, not a person. The specific person is determined at completion time. This makes the PM library reusable across companies.

- `preventive_schedules.status` is pending | completed | overdue. The cron also runs a check: `WHERE due_date < NOW() AND status = 'pending'` → `UPDATE SET status = 'overdue'`. This makes "overdue PM tasks" a trivial query with a covered index.

## Pre-aggregated analytics — equipment_analytics

This table is the performance safety valve. MTBF, MTTR, health scores, and cost totals across millions of issue rows would be expensive to compute on every dashboard load.

The Go nightly job pre-calculates everything per (equipment_id, period_date) and writes it here. Dashboards read from this table exclusively. The issues and maintenance_actions tables are never aggregated in real time.

- `health_score` is a 0–100 float. The formula: normalise MTBF trend + repair cost ratio + age ratio. The exact weights are configurable per company (stored in companies.branding JSONB or a separate config table in Phase 2). The score on the dashboard is a read of a pre-computed column, not a live calculation.

### Observability — notifications_log

Every notification attempt is logged with sent_at, delivered_at, and status. This table exists for two reasons: debugging ("why didn't the manager get the critical SMS?") and compliance (GDPR audit trail of what was communicated to whom and when).

- `delivered_at` is populated by Postmark/Twilio webhooks. If sent_at is set but delivered_at is null after 30 minutes, the Go service can alert on notification failures.

---

### Critical Indexes

```sql
Expand Down Expand Up @@ -978,7 +1288,7 @@ gantt
- [ ] `location_types`, `severity_levels`, `issue_categories`, `equipment_categories` all populated from template
- [ ] Self-referencing locations hierarchy works at depth 2 (Restaurant → Area)
- [ ] `path` column correctly materialised on insert and update
- [ ] Equipment created with `qr_code_id` auto-generated
- [ ] Equipment created with `equipment_id` auto-generated

**Weeks 5–6 — Core Workflow**

Expand Down
23 changes: 23 additions & 0 deletions api/.air.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
root = "."
tmp_dir = "tmp"

[build]
cmd = "go build -o tmp/operafix-api ./cmd/server"
bin = "tmp/operafix-api"
include_ext = ["go", "toml", "yaml"]
exclude_dir = ["vendor", "testdata"]
delay = 500
kill_delay = "200ms"
rerun = false

[log]
time = true

[color]
main = "yellow"
watcher = "cyan"
build = "green"
runner = "magenta"

[misc]
clean_on_exit = true
Loading