Skip to content

Commit 65482ac

Browse files
fangchenliclaude
andcommitted
PERF: avoid NumPy fallback in ArrowStringArray._from_sequence for integer types
When converting ArrowExtensionArray to string dtype, use PyArrow's native pc.cast() for integer and string types where the string representation matches Python's str(). This avoids unnecessary conversion through NumPy. Float and boolean types still fall back to lib.ensure_string_array because PyArrow's string representation differs from Python's str(): - Float: 1.0 -> "1" (PyArrow) vs "1.0" (Python) - Bool: True -> "true" (PyArrow) vs "True" (Python) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 7b51d3a commit 65482ac

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

pandas/core/arrays/string_arrow.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,23 @@ def _from_sequence(
210210
result = scalars._data
211211
result = lib.ensure_string_array(result, copy=copy, convert_na_value=False)
212212
pa_arr = pa.array(result, mask=na_values, type=pa.large_string())
213+
elif isinstance(scalars, ArrowExtensionArray):
214+
pa_type = scalars._pa_array.type
215+
# Use PyArrow's native cast for integer and string types where
216+
# the string representation matches Python's str().
217+
# Float and boolean have different representations in PyArrow
218+
# (e.g., 1.0 -> "1" instead of "1.0", True -> "true" instead of "True")
219+
if (
220+
pa.types.is_integer(pa_type)
221+
or pa.types.is_large_string(pa_type)
222+
or pa.types.is_string(pa_type)
223+
):
224+
pa_arr = pc.cast(scalars._pa_array, pa.large_string())
225+
else:
226+
# Fall back for types where PyArrow's string representation
227+
# differs from Python's str()
228+
result = lib.ensure_string_array(scalars, copy=copy)
229+
pa_arr = pa.array(result, type=pa.large_string(), from_pandas=True)
213230
elif isinstance(scalars, (pa.Array, pa.ChunkedArray)):
214231
pa_arr = pc.cast(scalars, pa.large_string())
215232
else:

0 commit comments

Comments
 (0)