Skip to content

Commit ef6b3e6

Browse files
authored
Fix datasets task 7309: pin pyarrow < 20.0 for geoparquet compatibility (#43)
PyArrow 20.0 changed default string type mapping, causing geoparquet geometry columns to report as 'large_string' instead of 'string'. The pre-existing test_parquet_read_geoparquet asserts dtype == 'string', breaking all feature pairs for this task. Pin pyarrow to >=17.0.0,<20.0.0 to keep a modern version while avoiding the large_string default change. The Docker image needs to be rebuilt.
1 parent a6aabce commit ef6b3e6

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

  • dataset/huggingface_datasets_task/task7309

dataset/huggingface_datasets_task/task7309/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ RUN git clone https://github.com/huggingface/datasets.git repo && \
2020
# Set up Python environment and pre-install dependencies
2121
WORKDIR /workspace/repo
2222
RUN uv pip install --system -e .
23-
RUN uv pip uninstall --system pyarrow && uv pip install --system "pyarrow==20.0.0"
23+
RUN uv pip uninstall --system pyarrow && uv pip install --system "pyarrow>=17.0.0,<20.0.0"
2424
# Install tensorflow, explicitly excluding tensorflow-macos (macOS-only package)
2525
RUN uv pip install --system "tensorflow>=2.16.0,<2.17.0" || uv pip install --system "tensorflow==2.16.2"
2626
RUN uv pip install --system torch jax

0 commit comments

Comments
 (0)