Skip to content

fix(mssql): expose real schemas in information_schema.TABLES and fix preview query (#12242)#12429

Open
Beandon13 wants to merge 2 commits into
mindsdb:mainfrom
Beandon13:fix/mindsdb-12242-mssql-schema
Open

fix(mssql): expose real schemas in information_schema.TABLES and fix preview query (#12242)#12429
Beandon13 wants to merge 2 commits into
mindsdb:mainfrom
Beandon13:fix/mindsdb-12242-mssql-schema

Conversation

@Beandon13
Copy link
Copy Markdown

Description

Fixes #12242

When a MSSQL Server datasource is connected to MindsDB, two related bugs
affected how table schemas were exposed:

Bug 1 — INFORMATION_SCHEMA.TABLES hides real SQL schemas

MindsDB's system_tables layer unconditionally overwrites TABLE_SCHEMA
with the datasource name (e.g. f_prod). For single-schema integrations
(MySQL, SQLite) this is fine, but MSSQL databases routinely contain multiple
user schemas (dbo, app, usr, …). After the overwrite, all real schema
information was lost, so SELECT TABLE_SCHEMA, TABLE_NAME FROM INFORMATION_SCHEMA.TABLES returned f_prod in TABLE_SCHEMA and a bare
table name instead of the fully-qualified dbo.MyTable.

Bug 2 — UI table preview generates an invalid two-part query

Because the schema was not embedded in TABLE_NAME, the table-preview tooltip
generated SELECT * FROM f_prod.MyTable instead of the correct three-part
form SELECT * FROM f_prod.dbo.MyTable, which fails at execution time.

Type of change

  • 🐛 Bug fix (non-breaking change which fixes an issue)

Fix

get_tables() in the MSSQL handler now accepts an all: bool = False
parameter, matching the pattern already used by the Postgres and Databricks
handlers:

Caller all Behaviour
INFORMATION_SCHEMA.TABLES (system_tables layer) False (default) table_name returned as <schema>.<table> so the full three-part reference is preserved after the datasource-level TABLE_SCHEMA override
Explorer / tree API (tree.py) True Raw table_schema + plain table_name returned so the UI can group tables under schema nodes
self.schema configured either Schema filter applied; plain unqualified table_name returned (original behaviour)

get_columns() was extended with an optional schema_name parameter
(consistent with the Postgres handler) and now automatically extracts the
schema from a qualified <schema>.<table> name produced by get_tables(),
so column lookups remain precise in all modes.

Non-user system schemas (sys, guest, fixed database role schemas) are
filtered out in both all=True and all=False modes.

Verification Process

  • Test Location: tests/unit/handlers/test_mssql.py
  • Verification Steps:
    1. Connect a MSSQL datasource with multiple schemas (e.g. dbo, app).
    2. Run SELECT TABLE_SCHEMA, TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = '<datasource>'TABLE_NAME now includes the schema prefix (dbo.Customers, app.Orders, …).
    3. Run SELECT * FROM <datasource>.dbo.Customers LIMIT 10 — executes successfully.
    4. Hover a table in the Explorer UI — the preview query reads SELECT * FROM <datasource>.<schema>.<table> LIMIT 100.

Checklist

  • My code follows the style guidelines (PEP 8) of MindsDB.
  • I have appropriately commented on my code, especially in complex areas.
  • Relevant unit tests are updated or added (3 new test_get_tables_* tests, 2 new test_get_columns_* tests).

…preview query (mindsdb#12242)

MindsDB's information_schema layer always overwrites TABLE_SCHEMA with the
datasource name, which previously caused the real SQL schema names (dbo, app,
usr, etc.) to be completely invisible when querying INFORMATION_SCHEMA.TABLES
against a MSSQL datasource.  The UI table-preview tooltip generated an invalid
two-part query (<datasource>.<table>) instead of the required three-part form
(<datasource>.<schema>.<table>).

Changes:
* mssql_handler.get_tables() now accepts an `all` flag (matching the postgres /
  databricks pattern used by tree.py for the Explorer UI):
  - all=True (Explorer mode): returns raw table_schema + table_name columns so
    the UI can group tables under their schema nodes.
  - all=False (default, used by INFORMATION_SCHEMA.TABLES): qualifies table_name
    as "<schema>.<table>" so the full three-part name is preserved even after
    the system-level TABLE_SCHEMA override. Non-user schemas (sys, guest, fixed
    database roles) are filtered out in both modes.
  - self.schema configured: always filters to that single schema with plain
    unqualified names (original behavior preserved).

* mssql_handler.get_columns() now accepts an optional schema_name parameter
  (consistent with the postgres handler) and automatically extracts the schema
  from a qualified "<schema>.<table>" table_name produced by get_tables(), so
  column lookups remain precise when no explicit schema is configured.

* Tests: three new test methods cover get_tables(all=False), get_tables(all=True),
  get_tables with a configured schema, get_columns with a qualified table name,
  and get_columns with an explicit schema_name argument.

Fixes mindsdb#12242

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@entelligence-ai-pr-reviews
Copy link
Copy Markdown
Contributor

EntelligenceAI PR Summary

Refactors MSSQL handler table and column retrieval to support multiple query modes and improved schema resolution.

  • get_tables() now accepts all: bool parameter splitting behavior into schema-filtered, Explorer UI (all=True), and INFORMATION_SCHEMA (all=False) modes
  • System schemas (sys, INFORMATION_SCHEMA, guest, fixed-role schemas) are explicitly excluded in all query paths
  • Default mode qualifies table names as <schema>.<table> for full three-part name resolution in MindsDB SQL
  • get_columns() gains optional schema_name parameter with priority-based schema resolution: explicit arg > embedded schema in table name > self.schema
  • Four new unit tests added covering all new get_tables and get_columns modes and behaviors

Confidence Score: 2/5 - Changes Needed

Not safe to merge — the refactored get_tables() method in mssql_handler.py introduces a SQL injection vulnerability by directly interpolating the unsanitized schema_name parameter into the query string via f" AND table_schema = '{effective_schema}'", a security risk that did not exist before this PR. While the PR achieves meaningful improvements — multi-mode schema resolution, explicit exclusion of system schemas, and better three-part name qualification — the injection vector on effective_schema must be addressed with parameterized queries or strict allowlist validation before this can be safely deployed. Any caller or downstream system that passes user-influenced schema names would be exposed to arbitrary SQL execution against the MSSQL backend.

Key Findings:

  • Direct f-string interpolation of effective_schema (derived from the schema_name parameter) into the SQL WHERE clause in mssql_handler.py creates a SQL injection vector — a value like '; DROP TABLE foo; -- would execute arbitrary SQL on the target MSSQL server.
  • The vulnerability is newly introduced by this PR; prior to this refactor, the schema filtering path did not accept external schema_name input in this way, making this a net regression in security posture despite the functional improvements.
  • The fix is straightforward — use parameterized query arguments (e.g., passing effective_schema as a bound parameter rather than string interpolation), but the change must be made before merge to prevent exploitation.
Files requiring special attention
  • mindsdb/integrations/handlers/mssql_handler/mssql_handler.py

Copy link
Copy Markdown
Contributor

@entelligence-ai-pr-reviews entelligence-ai-pr-reviews Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactors MSSQL handler table and column retrieval to support multiple query modes and improved schema resolution.

  • get_tables() now accepts all: bool parameter splitting behavior into schema-filtered, Explorer UI (all=True), and INFORMATION_SCHEMA (all=False) modes
  • System schemas (sys, INFORMATION_SCHEMA, guest, fixed-role schemas) are explicitly excluded in all query paths
  • Default mode qualifies table names as <schema>.<table> for full three-part name resolution in MindsDB SQL
  • get_columns() gains optional schema_name parameter with priority-based schema resolution: explicit arg > embedded schema in table name > self.schema
  • Four new unit tests added covering all new get_tables and get_columns modes and behaviors

Comment on lines +533 to 542
# Resolve schema: explicit arg > embedded in table_name > handler-level self.schema.
effective_schema = schema_name or self.schema
if effective_schema is None and "." in table_name:
# table_name was qualified by get_tables() as "<schema>.<table>"
parts = table_name.split(".", 1)
effective_schema, table_name = parts[0], parts[1]

query = f"""
SELECT
COLUMN_NAME,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: The new schema_name parameter is directly interpolated into the SQL query (f" AND table_schema = '{effective_schema}'") without any sanitization, creating a SQL injection vector that didn't exist before — a caller passing schema_name = "'; DROP TABLE foo; --" would execute arbitrary SQL.

🤖 AI Agent Prompt for Cursor/Windsurf

📋 Copy this prompt to your AI coding assistant (Cursor, Windsurf, etc.) to get help fixing this issue

In file `mindsdb/integrations/handlers/mssql_handler/mssql_handler.py`, the `get_columns` method (around line 533-542) builds SQL by directly interpolating `effective_schema` (which comes from the new `schema_name` parameter or from splitting `table_name`) into the query string. Add input validation/sanitization for `effective_schema` before using it in the f-string, e.g. validate it matches `^[a-zA-Z0-9_]+$` and raise ValueError otherwise, to prevent SQL injection through the new `schema_name` parameter.

Addresses SQL injection risk flagged in review: effective_schema and
table_name were interpolated directly into the WHERE clause without
sanitization. Apply SQL string-literal escaping (doubling single-quotes)
to safe_table_name and safe_schema before interpolation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Beandon13
Copy link
Copy Markdown
Author

Good catch on the injection risk. Fixed in the latest push: both table_name and effective_schema are now passed through str.replace("'", "''") before interpolation — standard SQL string-literal escaping. These values come from the DB's own INFORMATION_SCHEMA or from the handler's configured self.schema, but agreed the defensive escaping is the right call regardless of the source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: [MSSQL] information_schema.TABLES does not expose real schemas + wrong preview query in UI

1 participant