fix(mssql): expose real schemas in information_schema.TABLES and fix preview query (#12242)#12429
fix(mssql): expose real schemas in information_schema.TABLES and fix preview query (#12242)#12429Beandon13 wants to merge 2 commits into
Conversation
…preview query (mindsdb#12242) MindsDB's information_schema layer always overwrites TABLE_SCHEMA with the datasource name, which previously caused the real SQL schema names (dbo, app, usr, etc.) to be completely invisible when querying INFORMATION_SCHEMA.TABLES against a MSSQL datasource. The UI table-preview tooltip generated an invalid two-part query (<datasource>.<table>) instead of the required three-part form (<datasource>.<schema>.<table>). Changes: * mssql_handler.get_tables() now accepts an `all` flag (matching the postgres / databricks pattern used by tree.py for the Explorer UI): - all=True (Explorer mode): returns raw table_schema + table_name columns so the UI can group tables under their schema nodes. - all=False (default, used by INFORMATION_SCHEMA.TABLES): qualifies table_name as "<schema>.<table>" so the full three-part name is preserved even after the system-level TABLE_SCHEMA override. Non-user schemas (sys, guest, fixed database roles) are filtered out in both modes. - self.schema configured: always filters to that single schema with plain unqualified names (original behavior preserved). * mssql_handler.get_columns() now accepts an optional schema_name parameter (consistent with the postgres handler) and automatically extracts the schema from a qualified "<schema>.<table>" table_name produced by get_tables(), so column lookups remain precise when no explicit schema is configured. * Tests: three new test methods cover get_tables(all=False), get_tables(all=True), get_tables with a configured schema, get_columns with a qualified table name, and get_columns with an explicit schema_name argument. Fixes mindsdb#12242 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
EntelligenceAI PR SummaryRefactors MSSQL handler table and column retrieval to support multiple query modes and improved schema resolution.
Confidence Score: 2/5 - Changes NeededNot safe to merge — the refactored Key Findings:
Files requiring special attention
|
There was a problem hiding this comment.
Refactors MSSQL handler table and column retrieval to support multiple query modes and improved schema resolution.
get_tables()now acceptsall: boolparameter splitting behavior into schema-filtered, Explorer UI (all=True), and INFORMATION_SCHEMA (all=False) modes- System schemas (
sys,INFORMATION_SCHEMA,guest, fixed-role schemas) are explicitly excluded in all query paths - Default mode qualifies table names as
<schema>.<table>for full three-part name resolution in MindsDB SQL get_columns()gains optionalschema_nameparameter with priority-based schema resolution: explicit arg > embedded schema in table name >self.schema- Four new unit tests added covering all new
get_tablesandget_columnsmodes and behaviors
| # Resolve schema: explicit arg > embedded in table_name > handler-level self.schema. | ||
| effective_schema = schema_name or self.schema | ||
| if effective_schema is None and "." in table_name: | ||
| # table_name was qualified by get_tables() as "<schema>.<table>" | ||
| parts = table_name.split(".", 1) | ||
| effective_schema, table_name = parts[0], parts[1] | ||
|
|
||
| query = f""" | ||
| SELECT | ||
| COLUMN_NAME, |
There was a problem hiding this comment.
Correctness: The new schema_name parameter is directly interpolated into the SQL query (f" AND table_schema = '{effective_schema}'") without any sanitization, creating a SQL injection vector that didn't exist before — a caller passing schema_name = "'; DROP TABLE foo; --" would execute arbitrary SQL.
🤖 AI Agent Prompt for Cursor/Windsurf
📋 Copy this prompt to your AI coding assistant (Cursor, Windsurf, etc.) to get help fixing this issue
In file `mindsdb/integrations/handlers/mssql_handler/mssql_handler.py`, the `get_columns` method (around line 533-542) builds SQL by directly interpolating `effective_schema` (which comes from the new `schema_name` parameter or from splitting `table_name`) into the query string. Add input validation/sanitization for `effective_schema` before using it in the f-string, e.g. validate it matches `^[a-zA-Z0-9_]+$` and raise ValueError otherwise, to prevent SQL injection through the new `schema_name` parameter.
Addresses SQL injection risk flagged in review: effective_schema and table_name were interpolated directly into the WHERE clause without sanitization. Apply SQL string-literal escaping (doubling single-quotes) to safe_table_name and safe_schema before interpolation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Good catch on the injection risk. Fixed in the latest push: both |
Description
Fixes #12242
When a MSSQL Server datasource is connected to MindsDB, two related bugs
affected how table schemas were exposed:
Bug 1 —
INFORMATION_SCHEMA.TABLEShides real SQL schemasMindsDB's
system_tableslayer unconditionally overwritesTABLE_SCHEMAwith the datasource name (e.g.
f_prod). For single-schema integrations(MySQL, SQLite) this is fine, but MSSQL databases routinely contain multiple
user schemas (
dbo,app,usr, …). After the overwrite, all real schemainformation was lost, so
SELECT TABLE_SCHEMA, TABLE_NAME FROM INFORMATION_SCHEMA.TABLESreturnedf_prodinTABLE_SCHEMAand a baretable name instead of the fully-qualified
dbo.MyTable.Bug 2 — UI table preview generates an invalid two-part query
Because the schema was not embedded in
TABLE_NAME, the table-preview tooltipgenerated
SELECT * FROM f_prod.MyTableinstead of the correct three-partform
SELECT * FROM f_prod.dbo.MyTable, which fails at execution time.Type of change
Fix
get_tables()in the MSSQL handler now accepts anall: bool = Falseparameter, matching the pattern already used by the Postgres and Databricks
handlers:
allINFORMATION_SCHEMA.TABLES(system_tables layer)False(default)table_namereturned as<schema>.<table>so the full three-part reference is preserved after the datasource-levelTABLE_SCHEMAoverridetree.py)Truetable_schema+ plaintable_namereturned so the UI can group tables under schema nodesself.schemaconfiguredtable_namereturned (original behaviour)get_columns()was extended with an optionalschema_nameparameter(consistent with the Postgres handler) and now automatically extracts the
schema from a qualified
<schema>.<table>name produced byget_tables(),so column lookups remain precise in all modes.
Non-user system schemas (
sys,guest, fixed database role schemas) arefiltered out in both
all=Trueandall=Falsemodes.Verification Process
tests/unit/handlers/test_mssql.pydbo,app).SELECT TABLE_SCHEMA, TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = '<datasource>'—TABLE_NAMEnow includes the schema prefix (dbo.Customers,app.Orders, …).SELECT * FROM <datasource>.dbo.Customers LIMIT 10— executes successfully.SELECT * FROM <datasource>.<schema>.<table> LIMIT 100.Checklist
test_get_tables_*tests, 2 newtest_get_columns_*tests).