Add result_set_type_hints for precise complex type conversion#690
Draft
laughingman7743 wants to merge 10 commits intomasterfrom
Draft
Add result_set_type_hints for precise complex type conversion#690laughingman7743 wants to merge 10 commits intomasterfrom
laughingman7743 wants to merge 10 commits intomasterfrom
Conversation
The Athena GetQueryResults API only returns base type names (e.g., "array", "map", "row") without nested type signatures, causing _convert_value() to use heuristic inference that incorrectly converts varchar values like "1234" to int(1234) inside complex types. This adds a result_set_type_hints parameter to all cursor execute() methods so users can provide full Athena DDL type signatures for precise conversion. Also changes the default behavior so nested elements without type hints remain as strings instead of being heuristically inferred (breaking change). Closes #689 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move TypeNode, TypeSignatureParser, and TypedValueConverter into a new pyathena/parser.py module. TypedValueConverter receives converter dependencies via constructor injection to avoid circular imports. Also moves _split_array_items to parser.py as a shared parsing utility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2251ca6 to
5d05791
Compare
Native format complex types (map, struct) now return string values instead of type-inferred values to prevent incorrect conversions (e.g., varchar "1234" → int 1234). JSON format paths are unaffected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TestTypeSignatureParser and TestTypedValueConverter test the parser module directly, so they belong in a dedicated test file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Place the private helper function before public classes for clearer top-down reading order. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document class-based vs standalone function test patterns, fixture usage with indirect parametrization, and integration vs unit test distinction. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ll string - Only pass type_hint kwarg when hint exists (avoids breaking custom Converters) - Use json.dumps for dict/list in JSON paths instead of str() (fixes nested structs) - Use convert() instead of _convert_element() in JSON paths (preserves "null" strings) - Use _split_array_items in typed map native path (supports nested row/map values) - Normalize result_set_type_hints keys to lowercase for case-insensitive lookup - Cache DefaultTypeConverter instance in S3FS converter - Add unit tests for all fixed edge cases Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
WHAT
Add
result_set_type_hintsparameter to all cursorexecute()methods and change default behavior for nested type conversion.Breaking Change
_convert_value()no longer performs heuristic type inference (isdigit, float detection, bool detection) for elements inside complex types parsed from Athena's native format. Values now remain as strings by default.Before:
[{string: 1234}, {string: "value"}](int inferred from varchar)After:
[{string: "1234"}, {string: "value"}](stays as string)New
result_set_type_hintsParameterUsers who need typed conversion of nested elements can provide full Athena DDL type signatures:
Changes
Core (
pyathena/converter.py)TypeNodedataclass for representing parsed type treesparse_type_signature()recursive parser for Athena DDL type strings_convert_value_with_type(),_convert_typed_array(),_convert_typed_map(),_convert_typed_struct()_convert_value()changed to string-by-default (only null → None)Converter.convert()andDefaultTypeConverter.convert()extended withtype_hintparameterDefaultTypeConverter._parsed_hintsThreading (
result_set_type_hintsparameter added to)execute()methods (10 cursor types across sync/async, standard/pandas/arrow/polars/s3fs)_get_rows()methodsconvert()methodsTests (
tests/pyathena/test_converter.py)parse_type_signature()DDL parserDefaultTypeConverter.convert()with type hintsWHY
The Athena
GetQueryResultsAPI only returns base type names (e.g.,"array","map","row") inColumnInfo.Type, without nested type signatures. This caused_convert_value()to use heuristic inference, incorrectly convertingvarcharvalues like"1234"toint(1234)inside complex types.Closes #689
🤖 Generated with Claude Code