From 07a2cf546156279d2f99240b15606d02f665678d Mon Sep 17 00:00:00 2001
From: "codeflash-ai[bot]"
 <148906541+codeflash-ai[bot]@users.noreply.github.com>
Date: Wed, 12 Nov 2025 04:53:20 +0000
Subject: [PATCH] Optimize _format_schema_info
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The optimization replaces inefficient repeated string concatenation with a list-based approach, delivering a **12% speedup** by avoiding Python's string immutability overhead.

**Key Changes:**
1. **String Building Pattern**: Changed from `schema_info += f"..."` to `parts.append(f"...")` followed by `"".join(parts)`. This eliminates the quadratic time complexity of repeated string concatenation, which creates new string objects on each `+=` operation.

2. **Sample Values Processing**: Optimized `", ".join(f"{v}" for v in col.sample_values)` to `", ".join(map(str, col.sample_values))`. Using `map(str, ...)` is more efficient than generator expressions with f-strings for simple string conversion.

**Performance Analysis:**
- The line profiler shows the most significant improvement in the sample values line (42.8% → 33.8% of total time), reducing from 2.44ms to 1.33ms
- Large-scale tests show the greatest benefits: 29.6% faster for 100 columns with samples, and 22.6% faster for 1000 elements
- Small cases show minor slowdowns (1-17%) due to list allocation overhead, but this is negligible compared to gains at scale

**Why This Works:**
String concatenation in Python creates new string objects each time due to immutability. With many schema tables and columns, this becomes O(n²) behavior. The list approach collects all parts first, then performs a single join operation, maintaining O(n) complexity.

The optimization is most effective for schemas with many tables/columns and sample values, which aligns with typical database schema documentation use cases where this function would process substantial metadata.
---
 marimo/_server/ai/prompts.py | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/marimo/_server/ai/prompts.py b/marimo/_server/ai/prompts.py
index cc8da16e471..e7a5b83aa19 100644
--- a/marimo/_server/ai/prompts.py
+++ b/marimo/_server/ai/prompts.py
@@ -48,16 +48,17 @@ def _format_schema_info(tables: Optional[list[SchemaTable]]) -> str:
     if not tables:
         return ""
 
-    schema_info = "\n\n## Available schema:\n"
+    parts = ["\n\n## Available schema:\n"]
     for schema in tables:
-        schema_info += f"- Table: {schema.name}\n"
+        parts.append(f"- Table: {schema.name}\n")
         for col in schema.columns:
-            schema_info += f"  - Column: {col.name}\n"
-            schema_info += f"    - Type: {col.type}\n"
+            parts.append(f"  - Column: {col.name}\n")
+            parts.append(f"    - Type: {col.type}\n")
             if col.sample_values:
-                samples = ", ".join(f"{v}" for v in col.sample_values)
-                schema_info += f"    - Sample values: {samples}\n"
-    return schema_info
+                parts.append(
+                    f"    - Sample values: {', '.join(map(str, col.sample_values))}\n"
+                )
+    return "".join(parts)
 
 
 def _format_plain_text(plain_text: str) -> str: