FIX: Accept Row objects in bulkcopy without manual tuple conversion#615
FIX: Accept Row objects in bulkcopy without manual tuple conversion#615jahnvi480 wants to merge 8 commits into
Conversation
mssql_py_core expects native tuples but Row objects from fetchmany() fail the strict type check in Rust with: ValueError: Expected tuple, got: 'Row' object cannot be cast as 'tuple' Added _ensure_tuples() wrapper that auto-converts Row/list objects to tuples. Tuples pass through with zero overhead. Unexpected types raise TypeError immediately instead of producing confusing Rust-level errors. Fixes #482
There was a problem hiding this comment.
Pull request overview
This PR fixes issue #482 where cursor.bulkcopy() rejected Row objects returned by cursor.fetchall(). It adds an inner helper that normalizes each row in the input iterable to a tuple before handing it to the Rust-backed pycore_cursor.bulkcopy, so Row and list rows are now accepted directly.
Changes:
- Added a nested
_ensure_tuplesgenerator that yields tuples for tuple/list/Rowitems and raisesTypeErrorfor unsupported types. - Replaced the previous
iter(data)argument passed topycore_cursor.bulkcopywith_ensure_tuples(data).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Auto-convert Row/list objects to tuples for the Rust layer. | ||
| # mssql_py_core expects native tuples; Row objects (from fetchmany) | ||
| # are iterable but fail the strict type check in Rust. | ||
| def _ensure_tuples(iterable): |
There was a problem hiding this comment.
this helper is pure Python and testable without a SQL connection. a small unit test would lock down the Row/list/tuple/TypeError behavior and prevent silent regressions.
| elif isinstance(item, (list, Row)): | ||
| yield tuple(item) | ||
| else: | ||
| raise TypeError( |
There was a problem hiding this comment.
just flagging a pre-existing behavior (saw this while reviewing):
- the Rust layer (
mssql-py-core/src/cursor.rs) wraps the Python iterator in afilter_mapthat silently drops rows on error instead of propagating - there is a
TODOcomment in the Rust code acknowledging this - so if this
TypeErrorfires mid-stream, the row gets skipped rather than raising - I'll open a follow-up task to check this on the
mssql-rsside
There was a problem hiding this comment.
this still says Iterable[Union[Tuple, List]]. since the PR now explicitly handles Row, list, and tuple, worth updating to Iterable[Union[Tuple, List, "Row"]] (or Iterable[Sequence] if you want to stay generic and future-proof). same for the docstring a few lines below.
| return True | ||
|
|
||
| # ── Mapping from ODBC connection-string keywords (lowercase, as _parse returns) | ||
| def bulkcopy( |
There was a problem hiding this comment.
Create a doc for this 1 pager
There was a problem hiding this comment.
Check if we can checkfirst object and then convert directly to tuple
There was a problem hiding this comment.
check how other drivers are doing this, also make sure if this can be solved using Row.py
bewithgaurav
left a comment
There was a problem hiding this comment.
Adding a few more comments post the PR review discussion just for reference, will wait for the doc
| def _ensure_tuples(iterable): | ||
| for item in iterable: | ||
| if isinstance(item, tuple): | ||
| yield item |
There was a problem hiding this comment.
Tuple instance checking will have perf overhead, if the iterator already yields tuple (which is the normal scenario)
This section will add a type checking perf overhead (will be significant in bcp scale)
| if isinstance(item, tuple): | ||
| yield item | ||
| elif isinstance(item, (list, Row)): | ||
| yield tuple(item) |
There was a problem hiding this comment.
As discussed, this seems like a very targeted fix on the flow of cursor.execute -> fetchall() -> bulkcopy()
Getting Row objects from iterator is very specific to using cursor.fetchall() as a source, instead of doing this we might also thing of returning tuples from fetchall() (can be param enabled e.g. fetchall(tuples=True)) if the fix is JUST to tackle this scenario
### Work Item / Issue Reference <!-- IMPORTANT: Please follow the PR template guidelines below. For mssql-python maintainers: Insert your ADO Work Item ID below For external contributors: Insert Github Issue number below Only one reference is required - either GitHub issue OR ADO Work Item. --> <!-- mssql-python maintainers: ADO Work Item --> > [AB#45380](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/45380) <!-- External contributors: GitHub Issue --> > GitHub Issue: #609 ------------------------------------------------------------------- ### Summary This pull request addresses a critical bug in the handling of `executemany` with large `Decimal` values in the `mssql_python` driver, specifically when values exceed the SQL Server `MONEY` range. The main fix ensures that parameter type detection and conversion are consistent, preventing runtime errors when binding large decimal values. Extensive unit and integration tests are added to verify the fix and cover edge cases involving `Decimal` values, including scenarios with `NULL`s and multi-column inserts. **Bug Fix: Executemany Decimal Handling** * In `cursor.py`, the `executemany` method now explicitly overrides the C type for parameters with SQL type `DECIMAL` or `NUMERIC` to use `SQL_C_CHAR` (string binding) when the data is converted to strings. This prevents mismatches that previously caused runtime errors when inserting large decimal values. The column size is also adjusted to fit the longest string representation. **Testing: Unit and Integration Tests for Decimal Handling** * Added comprehensive unit tests in `test_001_globals.py` to verify type detection, mapping, and the override logic for `executemany` with large `Decimal` values, both within and outside the `MONEY` range. These tests confirm that the C type override is necessary and correctly applied. * Added integration tests in `test_004_cursor.py` to exercise the fixed behavior in real database scenarios, including: - Inserting batches with decimals inside and outside the `MONEY` range. - Handling `NULL` values alongside large decimals. - Multi-column inserts where one column contains large decimals.
### Work Item / Issue Reference <!-- IMPORTANT: Please follow the PR template guidelines below. For mssql-python maintainers: Insert your ADO Work Item ID below For external contributors: Insert Github Issue number below Only one reference is required - either GitHub issue OR ADO Work Item. --> <!-- mssql-python maintainers: ADO Work Item --> > [AB#45378](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/45378) <!-- External contributors: GitHub Issue --> > GitHub Issue: #607 ------------------------------------------------------------------- ### Summary <!-- Insert your summary of changes below. Minimum 10 characters required. --> The published macOS universal2 wheel dynamically links simdutf against a Homebrew path baked in at CI build time, causing an import failure on any machine that doesn't have simdutf installed at that exact path. Fix: remove the find_package(simdutf) call in CMakeLists.txt so FetchContent is always used, which builds simdutf as a static library and embeds its symbols directly into the extension. <!-- ### PR Title Guide > For feature requests FEAT: (short-description) > For non-feature requests like test case updates, config updates , dependency updates etc CHORE: (short-description) > For Fix requests FIX: (short-description) > For doc update requests DOC: (short-description) > For Formatting, indentation, or styling update STYLE: (short-description) > For Refactor, without any feature changes REFACTOR: (short-description) > For performance improvements PERF: (short-description) > For release related changes, without any feature changes RELEASE: #<RELEASE_VERSION> (short-description) ### Contribution Guidelines External contributors: - Create a GitHub issue first: https://github.com/microsoft/mssql-python/issues/new - Link the GitHub issue in the "GitHub Issue" section above - Follow the PR title format and provide a meaningful summary mssql-python maintainers: - Create an ADO Work Item following internal processes - Link the ADO Work Item in the "ADO Work Item" section above - Follow the PR title format and provide a meaningful summary --> Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com> Co-authored-by: Jahnvi Thakkar <61936179+jahnvi480@users.noreply.github.com> Co-authored-by: Gaurav Sharma <sharmag@microsoft.com>
Updated target timelines for several features in the roadmap. ### Work Item / Issue Reference <!-- IMPORTANT: Please follow the PR template guidelines below. For mssql-python maintainers: Insert your ADO Work Item ID below For external contributors: Insert Github Issue number below Only one reference is required - either GitHub issue OR ADO Work Item. --> <!-- mssql-python maintainers: ADO Work Item --> > [AB#43952](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/43952) <!-- External contributors: GitHub Issue --> > GitHub Issue: #<ISSUE_NUMBER> This pull request updates the feature roadmap in `ROADMAP.md` to adjust the planned timelines for several upcoming features. The main changes are revised target dates for features such as returning rows as dictionaries, asynchronous query execution, vector datatype support, table-valued parameters, and JSON datatype support. Roadmap timeline updates: * Changed the target timeline for "Return Rows as Dictionaries" to Q3 2026. * Changed the target timeline for "Asynchronous Query Execution" to Q4 2026. * Changed the target timeline for "Vector Datatype Support" to Q3 2026. * Changed the target timeline for "Table-Valued Parameters (TVPs)" to Q3 2026. * Changed the target timeline for "JSON Datatype Support" to Q4 2026. ------------------------------------------------------------------- ### Summary <!-- Insert your summary of changes below. Minimum 10 characters required. --> This pull request updates the feature roadmap in `ROADMAP.md` to revise the target timelines for several planned features. The most important changes are: Roadmap timeline updates: * Changed the target timeline for "Return Rows as Dictionaries" from Q4 2025 to Q3 2026. * Changed the target timeline for "Asynchronous Query Execution" from Q1 2026 to Q4 2026. * Changed the target timeline for "Vector Datatype Support" from Q1 2026 to Q3 2026. * Changed the target timeline for "Table-Valued Parameters (TVPs)" from Q1 2026 to Q3 2026. * Changed the target timeline for "JSON Datatype Support" from "ETA will be updated soon" to Q4 2026. <!-- ### PR Title Guide > For feature requests FEAT: (short-description) > For non-feature requests like test case updates, config updates , dependency updates etc CHORE: (short-description) > For Fix requests FIX: (short-description) > For doc update requests DOC: (short-description) > For Formatting, indentation, or styling update STYLE: (short-description) > For Refactor, without any feature changes REFACTOR: (short-description) > For release related changes, without any feature changes RELEASE: #<RELEASE_VERSION> (short-description) ### Contribution Guidelines External contributors: - Create a GitHub issue first: https://github.com/microsoft/mssql-python/issues/new - Link the GitHub issue in the "GitHub Issue" section above - Follow the PR title format and provide a meaningful summary mssql-python maintainers: - Create an ADO Work Item following internal processes - Link the ADO Work Item in the "ADO Work Item" section above - Follow the PR title format and provide a meaningful summary -->
Per @bewithgaurav review: - Removed list branch from _ensure_tuples: Rust rejects lists at cast::<PyTuple>(), so silently converting them was scope creep - Rewritten with check-first pattern using tuple(item._values) for Row objects (4x faster than __iter__, zero per-item isinstance) - Fixed type hint: Iterable[Union[Tuple, List]] -> Iterable[Union[Tuple, Row]] - Fixed docstring: removed 'or lists', documented Row acceptance - Added docstring on _ensure_tuples as type contract enforcement point
📊 Code Coverage Report
Diff CoverageDiff: main...HEAD, staged and unstaged changesNo lines with coverage information in this diff. 📋 Files Needing Attention📉 Files with overall lowest coverage (click to expand)mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 59.7%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.pybind.ddbc_bindings.cpp: 76.1%
mssql_python.row.py: 76.9%
mssql_python.__init__.py: 77.3%
mssql_python.pybind.connection.connection.cpp: 77.3%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.logging.py: 85.5%
mssql_python.connection.py: 85.6%🔗 Quick Links
|
Work Item / Issue Reference
Summary
This pull request improves the robustness of the
bulkcopyfunction inmssql_python/cursor.pyby ensuring that all data rows passed to the Rust layer are properly formatted as tuples. This prevents type errors when using different iterable types, such as lists or customRowobjects, as input.Data validation and conversion:
_ensure_tuplesto convert each item in the provided data to a tuple if it is a list orRowobject, and raise aTypeErrorfor unsupported types. This ensures compatibility with the strict type requirements of the Rust backend.bulkcopyto use_ensure_tuples(data)instead of simply iterating overdata, enforcing consistent tuple formatting for each row.