Skip to content

Add result_set_type_hints for precise complex type conversion#690

Draft
laughingman7743 wants to merge 10 commits intomasterfrom
feat/result-set-type-hints
Draft

Add result_set_type_hints for precise complex type conversion#690
laughingman7743 wants to merge 10 commits intomasterfrom
feat/result-set-type-hints

Conversation

@laughingman7743
Copy link
Member

@laughingman7743 laughingman7743 commented Feb 28, 2026

WHAT

Add result_set_type_hints parameter to all cursor execute() methods and change default behavior for nested type conversion.

Breaking Change

_convert_value() no longer performs heuristic type inference (isdigit, float detection, bool detection) for elements inside complex types parsed from Athena's native format. Values now remain as strings by default.

Before: [{string: 1234}, {string: "value"}] (int inferred from varchar)
After: [{string: "1234"}, {string: "value"}] (stays as string)

New result_set_type_hints Parameter

Users who need typed conversion of nested elements can provide full Athena DDL type signatures:

cursor.execute(
    "SELECT * FROM table",
    result_set_type_hints={
        "field": "array(row(name varchar, age integer))",
        "tags": "array(varchar)",
        "metadata": "map(varchar, integer)",
    }
)

Changes

Core (pyathena/converter.py)

  • TypeNode dataclass for representing parsed type trees
  • parse_type_signature() recursive parser for Athena DDL type strings
  • Typed conversion functions: _convert_value_with_type(), _convert_typed_array(), _convert_typed_map(), _convert_typed_struct()
  • _convert_value() changed to string-by-default (only null → None)
  • Converter.convert() and DefaultTypeConverter.convert() extended with type_hint parameter
  • Parsed type hint caching in DefaultTypeConverter._parsed_hints

Threading (result_set_type_hints parameter added to)

  • All cursor execute() methods (10 cursor types across sync/async, standard/pandas/arrow/polars/s3fs)
  • All result set constructors and _get_rows() methods
  • All converter convert() methods

Tests (tests/pyathena/test_converter.py)

  • Updated expectations for breaking change
  • 12 tests for parse_type_signature() DDL parser
  • 16 tests for typed conversion via DefaultTypeConverter.convert() with type hints
  • 99/99 tests pass

WHY

The Athena GetQueryResults API only returns base type names (e.g., "array", "map", "row") in ColumnInfo.Type, without nested type signatures. This caused _convert_value() to use heuristic inference, incorrectly converting varchar values like "1234" to int(1234) inside complex types.

Closes #689

🤖 Generated with Claude Code

laughingman7743 and others added 3 commits February 28, 2026 13:40
The Athena GetQueryResults API only returns base type names (e.g., "array",
"map", "row") without nested type signatures, causing _convert_value() to
use heuristic inference that incorrectly converts varchar values like "1234"
to int(1234) inside complex types.

This adds a result_set_type_hints parameter to all cursor execute() methods
so users can provide full Athena DDL type signatures for precise conversion.
Also changes the default behavior so nested elements without type hints
remain as strings instead of being heuristically inferred (breaking change).

Closes #689

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move TypeNode, TypeSignatureParser, and TypedValueConverter into a new
pyathena/parser.py module. TypedValueConverter receives converter
dependencies via constructor injection to avoid circular imports.
Also moves _split_array_items to parser.py as a shared parsing utility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@laughingman7743 laughingman7743 force-pushed the feat/result-set-type-hints branch from 2251ca6 to 5d05791 Compare February 28, 2026 04:40
laughingman7743 and others added 7 commits February 28, 2026 14:03
Native format complex types (map, struct) now return string values
instead of type-inferred values to prevent incorrect conversions
(e.g., varchar "1234" → int 1234). JSON format paths are unaffected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TestTypeSignatureParser and TestTypedValueConverter test the parser
module directly, so they belong in a dedicated test file.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Place the private helper function before public classes for
clearer top-down reading order.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document class-based vs standalone function test patterns,
fixture usage with indirect parametrization, and integration
vs unit test distinction.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ll string

- Only pass type_hint kwarg when hint exists (avoids breaking custom Converters)
- Use json.dumps for dict/list in JSON paths instead of str() (fixes nested structs)
- Use convert() instead of _convert_element() in JSON paths (preserves "null" strings)
- Use _split_array_items in typed map native path (supports nested row/map values)
- Normalize result_set_type_hints keys to lowercase for case-insensitive lookup
- Cache DefaultTypeConverter instance in S3FS converter
- Add unit tests for all fixed edge cases

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Nested varchar in ARRAY<ROW<...>> is deserialized as int in DictCursor results

1 participant