Skip to content

FIX: Accept Row objects in bulkcopy without manual tuple conversion#615

Open
jahnvi480 wants to merge 8 commits into
mainfrom
jahnvi/github_issue_482
Open

FIX: Accept Row objects in bulkcopy without manual tuple conversion#615
jahnvi480 wants to merge 8 commits into
mainfrom
jahnvi/github_issue_482

Conversation

@jahnvi480
Copy link
Copy Markdown
Contributor

@jahnvi480 jahnvi480 commented Jun 2, 2026

Work Item / Issue Reference

AB#43796

GitHub Issue: #482


Summary

This pull request improves the robustness of the bulkcopy function in mssql_python/cursor.py by ensuring that all data rows passed to the Rust layer are properly formatted as tuples. This prevents type errors when using different iterable types, such as lists or custom Row objects, as input.

Data validation and conversion:

  • Added a helper function _ensure_tuples to convert each item in the provided data to a tuple if it is a list or Row object, and raise a TypeError for unsupported types. This ensures compatibility with the strict type requirements of the Rust backend.
  • Updated the call to bulkcopy to use _ensure_tuples(data) instead of simply iterating over data, enforcing consistent tuple formatting for each row.

mssql_py_core expects native tuples but Row objects from fetchmany()
fail the strict type check in Rust with:
  ValueError: Expected tuple, got: 'Row' object cannot be cast as 'tuple'

Added _ensure_tuples() wrapper that auto-converts Row/list objects to
tuples. Tuples pass through with zero overhead. Unexpected types raise
TypeError immediately instead of producing confusing Rust-level errors.

Fixes #482
Copilot AI review requested due to automatic review settings June 2, 2026 04:33
@github-actions github-actions Bot added the pr-size: small Minimal code update label Jun 2, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes issue #482 where cursor.bulkcopy() rejected Row objects returned by cursor.fetchall(). It adds an inner helper that normalizes each row in the input iterable to a tuple before handing it to the Rust-backed pycore_cursor.bulkcopy, so Row and list rows are now accepted directly.

Changes:

  • Added a nested _ensure_tuples generator that yields tuples for tuple/list/Row items and raises TypeError for unsupported types.
  • Replaced the previous iter(data) argument passed to pycore_cursor.bulkcopy with _ensure_tuples(data).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Collaborator

@bewithgaurav bewithgaurav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: adding a new comment as per discussions from the review call.

Comment thread mssql_python/cursor.py
# Auto-convert Row/list objects to tuples for the Rust layer.
# mssql_py_core expects native tuples; Row objects (from fetchmany)
# are iterable but fail the strict type check in Rust.
def _ensure_tuples(iterable):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this helper is pure Python and testable without a SQL connection. a small unit test would lock down the Row/list/tuple/TypeError behavior and prevent silent regressions.

Comment thread mssql_python/cursor.py Outdated
elif isinstance(item, (list, Row)):
yield tuple(item)
else:
raise TypeError(
Copy link
Copy Markdown
Collaborator

@bewithgaurav bewithgaurav Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just flagging a pre-existing behavior (saw this while reviewing):

  • the Rust layer (mssql-py-core/src/cursor.rs) wraps the Python iterator in a filter_map that silently drops rows on error instead of propagating
  • there is a TODO comment in the Rust code acknowledging this
  • so if this TypeError fires mid-stream, the row gets skipped rather than raising
  • I'll open a follow-up task to check this on the mssql-rs side

Comment thread mssql_python/cursor.py Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this still says Iterable[Union[Tuple, List]]. since the PR now explicitly handles Row, list, and tuple, worth updating to Iterable[Union[Tuple, List, "Row"]] (or Iterable[Sequence] if you want to stay generic and future-proof). same for the docstring a few lines below.

Comment thread mssql_python/cursor.py
return True

# ── Mapping from ODBC connection-string keywords (lowercase, as _parse returns)
def bulkcopy(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a doc for this 1 pager

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if we can checkfirst object and then convert directly to tuple

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check how other drivers are doing this, also make sure if this can be solved using Row.py

Copy link
Copy Markdown
Collaborator

@bewithgaurav bewithgaurav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a few more comments post the PR review discussion just for reference, will wait for the doc

Comment thread mssql_python/cursor.py Outdated
def _ensure_tuples(iterable):
for item in iterable:
if isinstance(item, tuple):
yield item
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuple instance checking will have perf overhead, if the iterator already yields tuple (which is the normal scenario)
This section will add a type checking perf overhead (will be significant in bcp scale)

Comment thread mssql_python/cursor.py Outdated
if isinstance(item, tuple):
yield item
elif isinstance(item, (list, Row)):
yield tuple(item)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, this seems like a very targeted fix on the flow of cursor.execute -> fetchall() -> bulkcopy()

Getting Row objects from iterator is very specific to using cursor.fetchall() as a source, instead of doing this we might also thing of returning tuples from fetchall() (can be param enabled e.g. fetchall(tuples=True)) if the fix is JUST to tackle this scenario

jahnvi480 and others added 6 commits June 2, 2026 14:42
### Work Item / Issue Reference  
<!-- 
IMPORTANT: Please follow the PR template guidelines below.
For mssql-python maintainers: Insert your ADO Work Item ID below 
For external contributors: Insert Github Issue number below
Only one reference is required - either GitHub issue OR ADO Work Item.
-->

<!-- mssql-python maintainers: ADO Work Item -->
>
[AB#45380](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/45380)

<!-- External contributors: GitHub Issue -->
> GitHub Issue: #609 

-------------------------------------------------------------------
### Summary   
This pull request addresses a critical bug in the handling of
`executemany` with large `Decimal` values in the `mssql_python` driver,
specifically when values exceed the SQL Server `MONEY` range. The main
fix ensures that parameter type detection and conversion are consistent,
preventing runtime errors when binding large decimal values. Extensive
unit and integration tests are added to verify the fix and cover edge
cases involving `Decimal` values, including scenarios with `NULL`s and
multi-column inserts.

**Bug Fix: Executemany Decimal Handling**

* In `cursor.py`, the `executemany` method now explicitly overrides the
C type for parameters with SQL type `DECIMAL` or `NUMERIC` to use
`SQL_C_CHAR` (string binding) when the data is converted to strings.
This prevents mismatches that previously caused runtime errors when
inserting large decimal values. The column size is also adjusted to fit
the longest string representation.

**Testing: Unit and Integration Tests for Decimal Handling**

* Added comprehensive unit tests in `test_001_globals.py` to verify type
detection, mapping, and the override logic for `executemany` with large
`Decimal` values, both within and outside the `MONEY` range. These tests
confirm that the C type override is necessary and correctly applied.
* Added integration tests in `test_004_cursor.py` to exercise the fixed
behavior in real database scenarios, including:
- Inserting batches with decimals inside and outside the `MONEY` range.
  - Handling `NULL` values alongside large decimals.
  - Multi-column inserts where one column contains large decimals.
### Work Item / Issue Reference  
<!-- 
IMPORTANT: Please follow the PR template guidelines below.
For mssql-python maintainers: Insert your ADO Work Item ID below 
For external contributors: Insert Github Issue number below
Only one reference is required - either GitHub issue OR ADO Work Item.
-->

<!-- mssql-python maintainers: ADO Work Item -->
>
[AB#45378](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/45378)

<!-- External contributors: GitHub Issue -->
> GitHub Issue: #607

-------------------------------------------------------------------
### Summary   
<!-- Insert your summary of changes below. Minimum 10 characters
required. -->

The published macOS universal2 wheel dynamically links simdutf against a
Homebrew path baked in at CI build time, causing an import failure on
any machine that doesn't have simdutf installed at that exact path.

Fix: remove the find_package(simdutf) call in CMakeLists.txt so
FetchContent is always used, which builds simdutf as a static library
and embeds its symbols directly into the extension.

<!-- 
### PR Title Guide

> For feature requests
FEAT: (short-description)

> For non-feature requests like test case updates, config updates ,
dependency updates etc
CHORE: (short-description) 

> For Fix requests
FIX: (short-description)

> For doc update requests 
DOC: (short-description)

> For Formatting, indentation, or styling update
STYLE: (short-description)

> For Refactor, without any feature changes
REFACTOR: (short-description)

> For performance improvements
PERF: (short-description)

> For release related changes, without any feature changes
RELEASE: #<RELEASE_VERSION> (short-description) 

### Contribution Guidelines

External contributors:
- Create a GitHub issue first:
https://github.com/microsoft/mssql-python/issues/new
- Link the GitHub issue in the "GitHub Issue" section above
- Follow the PR title format and provide a meaningful summary

mssql-python maintainers:
- Create an ADO Work Item following internal processes
- Link the ADO Work Item in the "ADO Work Item" section above  
- Follow the PR title format and provide a meaningful summary
-->

Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
Co-authored-by: Jahnvi Thakkar <61936179+jahnvi480@users.noreply.github.com>
Co-authored-by: Gaurav Sharma <sharmag@microsoft.com>
Updated target timelines for several features in the roadmap.

### Work Item / Issue Reference  
<!-- 
IMPORTANT: Please follow the PR template guidelines below.
For mssql-python maintainers: Insert your ADO Work Item ID below 
For external contributors: Insert Github Issue number below
Only one reference is required - either GitHub issue OR ADO Work Item.
-->

<!-- mssql-python maintainers: ADO Work Item -->
>
[AB#43952](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/43952)

<!-- External contributors: GitHub Issue -->
> GitHub Issue: #<ISSUE_NUMBER>
This pull request updates the feature roadmap in `ROADMAP.md` to adjust
the planned timelines for several upcoming features. The main changes
are revised target dates for features such as returning rows as
dictionaries, asynchronous query execution, vector datatype support,
table-valued parameters, and JSON datatype support.

Roadmap timeline updates:

* Changed the target timeline for "Return Rows as Dictionaries" to Q3
2026.
* Changed the target timeline for "Asynchronous Query Execution" to Q4
2026.
* Changed the target timeline for "Vector Datatype Support" to Q3 2026.
* Changed the target timeline for "Table-Valued Parameters (TVPs)" to Q3
2026.
* Changed the target timeline for "JSON Datatype Support" to Q4 2026.
-------------------------------------------------------------------
### Summary   
<!-- Insert your summary of changes below. Minimum 10 characters
required. -->
This pull request updates the feature roadmap in `ROADMAP.md` to revise
the target timelines for several planned features. The most important
changes are:

Roadmap timeline updates:

* Changed the target timeline for "Return Rows as Dictionaries" from Q4
2025 to Q3 2026.
* Changed the target timeline for "Asynchronous Query Execution" from Q1
2026 to Q4 2026.
* Changed the target timeline for "Vector Datatype Support" from Q1 2026
to Q3 2026.
* Changed the target timeline for "Table-Valued Parameters (TVPs)" from
Q1 2026 to Q3 2026.
* Changed the target timeline for "JSON Datatype Support" from "ETA will
be updated soon" to Q4 2026.

<!-- 
### PR Title Guide

> For feature requests
FEAT: (short-description)

> For non-feature requests like test case updates, config updates ,
dependency updates etc
CHORE: (short-description) 

> For Fix requests
FIX: (short-description)

> For doc update requests 
DOC: (short-description)

> For Formatting, indentation, or styling update
STYLE: (short-description)

> For Refactor, without any feature changes
REFACTOR: (short-description)

> For release related changes, without any feature changes
RELEASE: #<RELEASE_VERSION> (short-description) 

### Contribution Guidelines

External contributors:
- Create a GitHub issue first:
https://github.com/microsoft/mssql-python/issues/new
- Link the GitHub issue in the "GitHub Issue" section above
- Follow the PR title format and provide a meaningful summary

mssql-python maintainers:
- Create an ADO Work Item following internal processes
- Link the ADO Work Item in the "ADO Work Item" section above  
- Follow the PR title format and provide a meaningful summary
-->
Per @bewithgaurav review:
- Removed list branch from _ensure_tuples: Rust rejects lists at
  cast::<PyTuple>(), so silently converting them was scope creep
- Rewritten with check-first pattern using tuple(item._values) for
  Row objects (4x faster than __iter__, zero per-item isinstance)
- Fixed type hint: Iterable[Union[Tuple, List]] -> Iterable[Union[Tuple, Row]]
- Fixed docstring: removed 'or lists', documented Row acceptance
- Added docstring on _ensure_tuples as type contract enforcement point
@github-actions github-actions Bot added pr-size: medium Moderate update size and removed pr-size: small Minimal code update labels Jun 5, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

📊 Code Coverage Report

🔥 Diff Coverage

100%


🎯 Overall Coverage

80%


📈 Total Lines Covered: 6654 out of 8257
📁 Project: mssql-python


Diff Coverage

Diff: main...HEAD, staged and unstaged changes

No lines with coverage information in this diff.


📋 Files Needing Attention

📉 Files with overall lowest coverage (click to expand)
mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 59.7%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.pybind.ddbc_bindings.cpp: 76.1%
mssql_python.row.py: 76.9%
mssql_python.__init__.py: 77.3%
mssql_python.pybind.connection.connection.cpp: 77.3%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.logging.py: 85.5%
mssql_python.connection.py: 85.6%

🔗 Quick Links

⚙️ Build Summary 📋 Coverage Details

View Azure DevOps Build

Browse Full Coverage Report

@github-actions github-actions Bot added pr-size: small Minimal code update and removed pr-size: medium Moderate update size labels Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-size: small Minimal code update

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants