tma

Menu image analysis platform with a React frontend and FastAPI backend.

License

This project is source-available under the PolyForm Noncommercial License 1.0.0.

You may use, copy, modify, and share the code for noncommercial purposes only. Commercial use (including using this project to generate revenue) is not allowed.

See the full license text in LICENSE.

Full Solution Overview

The section below is sourced from SOLUTION_OVERVIEW.md via ./scripts/generate-readme.sh.

Solution Overview

This document gives new contributors and agents a high-level map of the project, with emphasis on how backend menu analysis works.

1. What This Project Does

tma analyzes uploaded menu images and returns structured dish items with normalized bounding boxes. It also supports:

Recommendation generation based on selected dishes
Usage/limit tracking via Supabase
Benchmark/debug tooling for OCR/grouping strategies

2. System Architecture

flowchart LR
    FE["React Frontend (Vite)"] -->|"/menu/analyze, /menu/recommendations"| API["FastAPI Backend"]
    FE -->|"/debug/benchmark/*"| API
    API --> DIP["Azure Document Intelligence (OCR)"]
    API --> OAI["OpenAI Models (Grouping, Translation, Dish Info, Recommendations)"]
    API --> GCS["Google Custom Search (Dish Images)"]
    API --> SB["Supabase (Auth-adjacent data, cache, usage limits)"]

3. Repository Map

backend/src/main.py: app bootstrap + lifespan initialization
backend/src/api/: HTTP routes (/menu, /user-info, /debug)
backend/src/menu_engine/: flow registry + analysis/recommendation orchestration
backend/src/services/menu.py: core OCR/analyze pipelines
backend/src/services/ocr/: grouping strategies (heuristic, llm, layout experiment)
frontend/src/: UI, upload flow, flow toggle, benchmark debug page

4. Analyze API at a Glance

Primary endpoint: POST /menu/analyze

Requires: Authorization: Bearer <jwt>
Optional:
- query flowId
- header X-Menu-Flow
- header Accept-Language

Response shape:

results[]: { id, info, boundingBox }
meta: flowId, flowLabel, language, totalItems, contractVersion

5. Backend Analyze Flow Routing

Flow routing is handled in MenuFlowRegistry and MenuAnalysisService.

Registered flows:
- dip.auto_group.v1
- dip.lines_only.v1
- dip.layout_segments_llm.v1 (experimental)
Flow aliases from env, plus defaults like:
- fast -> dip.lines_only.v1
- layoutexp -> dip.layout_segments_llm.v1

flowchart TD
    A["POST /menu/analyze"] --> B["Validate file bytes"]
    B --> C["Resolve flow hint (flowId query or X-Menu-Flow header)"]
    C --> D{"Flow resolved?"}
    D -- "No" --> E["Return 400 with availableFlows"]
    D -- "Yes" --> F{"Flow ID"}
    F -- "dip.auto_group.v1" --> G["Run Auto Group pipeline"]
    F -- "dip.lines_only.v1" --> H["Run Lines Only pipeline"]
    F -- "dip.layout_segments_llm.v1" --> I["Run Layout Segments pipeline"]
    G --> J["Build response contract meta"]
    H --> J
    I --> J
    J --> K["Return results + meta"]

6. Analyze Pipeline Details

6.1 Shared Building Blocks

All analyze flows depend on the same core primitives in services/menu.py:

process_image(...)
- Decodes image with OpenCV
- Resizes to max width
- JPEG re-encodes to stay under payload constraints
run_dip(...)
- Submits image to Azure DIP
- Polls async operation URL
- Normalizes polygon format to x_coords/y_coords
process_dip_results(...)
- Cleans dish names
- Deduplicates repeated dish lines
- Calls LLM dish-info chain
- Optionally enriches images (Google + optional Supabase cache)
normalize_text_bbox_dip(...)
- Converts DIP polygons into normalized [0, 1] bounding boxes
serialize_dish_data_filtered(...)
- Filters unknowns and emits stable output structure

6.2 Flow A: `dip.auto_group.v1` (Default)

This is the default path for mixed menus (title + description layouts).

sequenceDiagram
    participant C as "Client"
    participant API as "Analyze Route"
    participant SVC as "upload_pipeline_with_dip_auto_group_lines"
    participant DIP as "Azure DIP"
    participant GRP as "build_paragraph"
    participant LLM as "OpenAI"

    C->>API: "POST /menu/analyze"
    API->>SVC: "Run default flow"
    SVC->>SVC: "process_image"
    SVC->>DIP: "run_dip(preserve_raw_lines=true)"
    DIP-->>SVC: "OCR lines"
    SVC->>GRP: "Group lines into paragraphs + individual lines"
    GRP->>LLM: "Optional LLM grouping (only if heuristic ambiguous and line count threshold exceeded)"
    LLM-->>GRP: "Grouped indices or fallback"
    GRP-->>SVC: "paragraphs + individual_lines"
    par "parallel fan-out"
        SVC->>LLM: "process_dip_results(individual_lines)"
        SVC->>LLM: "process_dip_paragraph_results(paragraphs)"
    end
    SVC->>SVC: "Merge paragraph context into nearest dish lines"
    SVC-->>API: "results[]"
    API-->>C: "results + meta(flowId=auto_group)"

Notes:

Grouping defaults to heuristic; LLM grouping is conditional.
If grouping fails/times out, pipeline falls back to line-level behavior.
Paragraph context is merged spatially into dish entries.

6.3 Flow B: `dip.lines_only.v1` (Fast/strict line extraction)

Used by frontend "heuristic" mode today (flowId=dip.lines_only.v1).

flowchart TD
    A["process_image"] --> B["run_dip(preserve_raw_lines=false)"]
    B --> C["Filter unknown/numeric-only OCR lines"]
    C --> D["process_dip_results on each line"]
    D --> E["normalize_text_bbox_dip"]
    E --> F["serialize_dish_data_filtered"]
    F --> G["Return results + meta(flowId=lines_only)"]

Notes:

No paragraph grouping stage.
Lower complexity and typically lower latency.

6.4 Flow C: `dip.layout_segments_llm.v1` (Experimental)

This flow adds geometry-aware segmentation and a post-group "wash" pass.

flowchart TD
    A["process_image"] --> B["run_dip(preserve_raw_lines=true)"]
    B --> C["build_paragraph_layout_experiment"]
    C --> C1["Split lines into layout segments"]
    C1 --> C2["LLM groups segments to paragraph groups"]
    C2 --> C3["Materialize paragraph lines + individual lines"]
    C3 --> D["wash_layout_lines"]
    D --> D1["Label lines as dish_title/description/price/non_dish"]
    D1 --> E["Parallel: process_dip_results(dish lines) + process_dip_paragraph_results(description lines)"]
    E --> F["Merge paragraph context + price tokens into dish entries"]
    F --> G["serialize_dish_data_filtered"]
    G --> H["Return results + meta(flowId=layout_segments_llm)"]

Notes:

Multiple fallback modes exist for timeout/parse/provider errors.
MENU_LAYOUT_ENABLE_HEURISTIC_FALLBACK can enable additional non-LLM fallback grouping.

7. Frontend to Backend Flow Mapping

In frontend/src/features/menu/services/menuUploadService.ts:

UI mode heuristic -> flowId=dip.lines_only.v1
UI mode llm -> flowId=dip.layout_segments_llm.v1

dip.auto_group.v1 remains backend default when no flow hint is provided.

8. Auth, Limits, and Recommendations

Auth:
- Backend expects JWT Bearer token
- Decoded using SECRET_KEY and audience "authenticated"
/menu/recommendations:
- Checks remaining accesses from Supabase
- Returns limit message if exhausted
- Otherwise calls LLM recommendation chain and records access

9. Debug and Benchmark Tooling

Benchmark runner: backend/benchmark/run.py
Debug endpoints (guarded by DEBUG_TOOLS_ENABLED):
- /debug/benchmark/runs
- /debug/benchmark/runs/{run_id}/summary
- /debug/benchmark/runs/{run_id}/cases/{case_id}
- /debug/benchmark/runs/{run_id}/cases/{case_id}/image
Frontend debug page route: /debug/benchmark with VITE_DEBUG_TOOLS=true

10. Contributor Starting Checklist

Read AGENTS.md for commands and environment setup.
Run backend tests first: cd backend && uv run python -m pytest -q.
For analyze changes, validate at least:
- flow routing behavior (flowId, alias handling)
- output contract (results + meta)
- fallback behavior under grouping failures/timeouts
If touching OCR/grouping logic, also run benchmark scripts to compare strategy behavior.

11. Extension Guide: Adding a New Analyze Flow

Implement a new flow class in backend/src/menu_engine/analysis.py with descriptor + run(...).
Register it in get_menu_flow_registry() in backend/src/api/deps.py.
Add flow id to env/config defaults as needed (MENU_ENABLED_FLOW_IDS, aliases).
Add tests under backend/src/tests/feature + backend/src/tests/unit.
If frontend should expose it, map a UI mode to new flowId.

Name		Name	Last commit message	Last commit date
Latest commit History 377 Commits
.codex/environments		.codex/environments
.github/workflows		.github/workflows
.idea		.idea
backend		backend
docs		docs
frontend		frontend
infrastructure/tf-docker		infrastructure/tf-docker
scripts		scripts
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
SOLUTION_OVERVIEW.md		SOLUTION_OVERVIEW.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tma

License

Full Solution Overview

Solution Overview

1. What This Project Does

2. System Architecture

3. Repository Map

4. Analyze API at a Glance

5. Backend Analyze Flow Routing

6. Analyze Pipeline Details

6.1 Shared Building Blocks

6.2 Flow A: `dip.auto_group.v1` (Default)

6.3 Flow B: `dip.lines_only.v1` (Fast/strict line extraction)

6.4 Flow C: `dip.layout_segments_llm.v1` (Experimental)

7. Frontend to Backend Flow Mapping

8. Auth, Limits, and Recommendations

9. Debug and Benchmark Tooling

10. Contributor Starting Checklist

11. Extension Guide: Adding a New Analyze Flow

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

tma

License

Full Solution Overview

Solution Overview

1. What This Project Does

2. System Architecture

3. Repository Map

4. Analyze API at a Glance

5. Backend Analyze Flow Routing

6. Analyze Pipeline Details

6.1 Shared Building Blocks

6.2 Flow A: dip.auto_group.v1 (Default)

6.3 Flow B: dip.lines_only.v1 (Fast/strict line extraction)

6.4 Flow C: dip.layout_segments_llm.v1 (Experimental)

7. Frontend to Backend Flow Mapping

8. Auth, Limits, and Recommendations

9. Debug and Benchmark Tooling

10. Contributor Starting Checklist

11. Extension Guide: Adding a New Analyze Flow

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

6.2 Flow A: `dip.auto_group.v1` (Default)

6.3 Flow B: `dip.lines_only.v1` (Fast/strict line extraction)

6.4 Flow C: `dip.layout_segments_llm.v1` (Experimental)

Packages