Menu image analysis platform with a React frontend and FastAPI backend.
This project is source-available under the PolyForm Noncommercial License 1.0.0.
You may use, copy, modify, and share the code for noncommercial purposes only. Commercial use (including using this project to generate revenue) is not allowed.
See the full license text in LICENSE.
The section below is sourced from SOLUTION_OVERVIEW.md via ./scripts/generate-readme.sh.
This document gives new contributors and agents a high-level map of the project, with emphasis on how backend menu analysis works.
tma analyzes uploaded menu images and returns structured dish items with normalized bounding boxes. It also supports:
- Recommendation generation based on selected dishes
- Usage/limit tracking via Supabase
- Benchmark/debug tooling for OCR/grouping strategies
flowchart LR
FE["React Frontend (Vite)"] -->|"/menu/analyze, /menu/recommendations"| API["FastAPI Backend"]
FE -->|"/debug/benchmark/*"| API
API --> DIP["Azure Document Intelligence (OCR)"]
API --> OAI["OpenAI Models (Grouping, Translation, Dish Info, Recommendations)"]
API --> GCS["Google Custom Search (Dish Images)"]
API --> SB["Supabase (Auth-adjacent data, cache, usage limits)"]
backend/src/main.py: app bootstrap + lifespan initializationbackend/src/api/: HTTP routes (/menu,/user-info,/debug)backend/src/menu_engine/: flow registry + analysis/recommendation orchestrationbackend/src/services/menu.py: core OCR/analyze pipelinesbackend/src/services/ocr/: grouping strategies (heuristic,llm,layout experiment)frontend/src/: UI, upload flow, flow toggle, benchmark debug page
Primary endpoint: POST /menu/analyze
- Requires:
Authorization: Bearer <jwt> - Optional:
- query
flowId - header
X-Menu-Flow - header
Accept-Language
- query
Response shape:
results[]:{ id, info, boundingBox }meta:flowId,flowLabel,language,totalItems,contractVersion
Flow routing is handled in MenuFlowRegistry and MenuAnalysisService.
- Registered flows:
dip.auto_group.v1dip.lines_only.v1dip.layout_segments_llm.v1(experimental)
- Flow aliases from env, plus defaults like:
fast -> dip.lines_only.v1layoutexp -> dip.layout_segments_llm.v1
flowchart TD
A["POST /menu/analyze"] --> B["Validate file bytes"]
B --> C["Resolve flow hint (flowId query or X-Menu-Flow header)"]
C --> D{"Flow resolved?"}
D -- "No" --> E["Return 400 with availableFlows"]
D -- "Yes" --> F{"Flow ID"}
F -- "dip.auto_group.v1" --> G["Run Auto Group pipeline"]
F -- "dip.lines_only.v1" --> H["Run Lines Only pipeline"]
F -- "dip.layout_segments_llm.v1" --> I["Run Layout Segments pipeline"]
G --> J["Build response contract meta"]
H --> J
I --> J
J --> K["Return results + meta"]
All analyze flows depend on the same core primitives in services/menu.py:
process_image(...)- Decodes image with OpenCV
- Resizes to max width
- JPEG re-encodes to stay under payload constraints
run_dip(...)- Submits image to Azure DIP
- Polls async operation URL
- Normalizes polygon format to
x_coords/y_coords
process_dip_results(...)- Cleans dish names
- Deduplicates repeated dish lines
- Calls LLM dish-info chain
- Optionally enriches images (Google + optional Supabase cache)
normalize_text_bbox_dip(...)- Converts DIP polygons into normalized
[0, 1]bounding boxes
- Converts DIP polygons into normalized
serialize_dish_data_filtered(...)- Filters unknowns and emits stable output structure
This is the default path for mixed menus (title + description layouts).
sequenceDiagram
participant C as "Client"
participant API as "Analyze Route"
participant SVC as "upload_pipeline_with_dip_auto_group_lines"
participant DIP as "Azure DIP"
participant GRP as "build_paragraph"
participant LLM as "OpenAI"
C->>API: "POST /menu/analyze"
API->>SVC: "Run default flow"
SVC->>SVC: "process_image"
SVC->>DIP: "run_dip(preserve_raw_lines=true)"
DIP-->>SVC: "OCR lines"
SVC->>GRP: "Group lines into paragraphs + individual lines"
GRP->>LLM: "Optional LLM grouping (only if heuristic ambiguous and line count threshold exceeded)"
LLM-->>GRP: "Grouped indices or fallback"
GRP-->>SVC: "paragraphs + individual_lines"
par "parallel fan-out"
SVC->>LLM: "process_dip_results(individual_lines)"
SVC->>LLM: "process_dip_paragraph_results(paragraphs)"
end
SVC->>SVC: "Merge paragraph context into nearest dish lines"
SVC-->>API: "results[]"
API-->>C: "results + meta(flowId=auto_group)"
Notes:
- Grouping defaults to heuristic; LLM grouping is conditional.
- If grouping fails/times out, pipeline falls back to line-level behavior.
- Paragraph context is merged spatially into dish entries.
Used by frontend "heuristic" mode today (flowId=dip.lines_only.v1).
flowchart TD
A["process_image"] --> B["run_dip(preserve_raw_lines=false)"]
B --> C["Filter unknown/numeric-only OCR lines"]
C --> D["process_dip_results on each line"]
D --> E["normalize_text_bbox_dip"]
E --> F["serialize_dish_data_filtered"]
F --> G["Return results + meta(flowId=lines_only)"]
Notes:
- No paragraph grouping stage.
- Lower complexity and typically lower latency.
This flow adds geometry-aware segmentation and a post-group "wash" pass.
flowchart TD
A["process_image"] --> B["run_dip(preserve_raw_lines=true)"]
B --> C["build_paragraph_layout_experiment"]
C --> C1["Split lines into layout segments"]
C1 --> C2["LLM groups segments to paragraph groups"]
C2 --> C3["Materialize paragraph lines + individual lines"]
C3 --> D["wash_layout_lines"]
D --> D1["Label lines as dish_title/description/price/non_dish"]
D1 --> E["Parallel: process_dip_results(dish lines) + process_dip_paragraph_results(description lines)"]
E --> F["Merge paragraph context + price tokens into dish entries"]
F --> G["serialize_dish_data_filtered"]
G --> H["Return results + meta(flowId=layout_segments_llm)"]
Notes:
- Multiple fallback modes exist for timeout/parse/provider errors.
MENU_LAYOUT_ENABLE_HEURISTIC_FALLBACKcan enable additional non-LLM fallback grouping.
In frontend/src/features/menu/services/menuUploadService.ts:
- UI mode
heuristic->flowId=dip.lines_only.v1 - UI mode
llm->flowId=dip.layout_segments_llm.v1
dip.auto_group.v1 remains backend default when no flow hint is provided.
- Auth:
- Backend expects JWT Bearer token
- Decoded using
SECRET_KEYand audience"authenticated"
/menu/recommendations:- Checks remaining accesses from Supabase
- Returns limit message if exhausted
- Otherwise calls LLM recommendation chain and records access
- Benchmark runner:
backend/benchmark/run.py - Debug endpoints (guarded by
DEBUG_TOOLS_ENABLED):/debug/benchmark/runs/debug/benchmark/runs/{run_id}/summary/debug/benchmark/runs/{run_id}/cases/{case_id}/debug/benchmark/runs/{run_id}/cases/{case_id}/image
- Frontend debug page route:
/debug/benchmarkwithVITE_DEBUG_TOOLS=true
- Read
AGENTS.mdfor commands and environment setup. - Run backend tests first:
cd backend && uv run python -m pytest -q. - For analyze changes, validate at least:
- flow routing behavior (
flowId, alias handling) - output contract (
results+meta) - fallback behavior under grouping failures/timeouts
- flow routing behavior (
- If touching OCR/grouping logic, also run benchmark scripts to compare strategy behavior.
- Implement a new flow class in
backend/src/menu_engine/analysis.pywithdescriptor+run(...). - Register it in
get_menu_flow_registry()inbackend/src/api/deps.py. - Add flow id to env/config defaults as needed (
MENU_ENABLED_FLOW_IDS, aliases). - Add tests under
backend/src/tests/feature+backend/src/tests/unit. - If frontend should expose it, map a UI mode to new
flowId.