Skip to content

Commit dc960ab

Browse files
Merge remote-tracking branch 'upstream/main' into series-init-copy-cleanup
2 parents 26a4fc8 + 576d05f commit dc960ab

File tree

36 files changed

+331
-390
lines changed

36 files changed

+331
-390
lines changed

.github/workflows/docbuild-and-upload.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ jobs:
9393
run: mv doc/build/html web/build/docs
9494

9595
- name: Save website as an artifact
96-
uses: actions/upload-artifact@v5
96+
uses: actions/upload-artifact@v6
9797
with:
9898
name: website
9999
path: web/build

.github/workflows/wheels.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ jobs:
6464
python -m pip install build
6565
python -m build --sdist
6666
67-
- uses: actions/upload-artifact@v5
67+
- uses: actions/upload-artifact@v6
6868
with:
6969
name: sdist
7070
path: ./dist/*
@@ -138,7 +138,7 @@ jobs:
138138
# removes unnecessary files from the release
139139
- name: Download sdist (not macOS)
140140
#if: ${{ matrix.buildplat[1] != 'macosx_*' }}
141-
uses: actions/download-artifact@v6
141+
uses: actions/download-artifact@v7
142142
with:
143143
name: sdist
144144
path: ./dist
@@ -197,7 +197,7 @@ jobs:
197197
shell: bash -el {0}
198198
run: for whl in $(ls ./dist/*.whl); do wheel unpack $whl -d /tmp; done
199199

200-
- uses: actions/upload-artifact@v5
200+
- uses: actions/upload-artifact@v6
201201
with:
202202
name: ${{ matrix.python[0] }}-${{ matrix.buildplat[1] }}
203203
path: ./dist/*.whl
@@ -229,11 +229,11 @@ jobs:
229229

230230
steps:
231231
- name: Download all artefacts
232-
uses: actions/download-artifact@v6
232+
uses: actions/download-artifact@v7
233233
with:
234234
path: dist # everything lands in ./dist/**
235235

236-
# TODO: This step can be probably be achieved by actions/download-artifact@v6
236+
# TODO: This step can be probably be achieved by actions/download-artifact@v7
237237
# by specifying merge-multiple: true, and a glob pattern
238238
- name: Collect files
239239
run: |

ci/code_checks.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,8 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
7979
-i "pandas.api.typing.DataFrameGroupBy.plot PR02" \
8080
-i "pandas.api.typing.SeriesGroupBy.plot PR02" \
8181
-i "pandas.api.typing.Resampler.quantile PR01,PR07" \
82+
-i "pandas.StringDtype.storage SA01" \
83+
-i "pandas.StringDtype.na_value SA01" \
8284
-i "pandas.tseries.offsets.BDay PR02,SA01" \
8385
-i "pandas.tseries.offsets.BHalfYearBegin.is_on_offset GL08" \
8486
-i "pandas.tseries.offsets.BHalfYearBegin.n GL08" \

doc/source/_static/schemas/01_table_dataframe.svg

Lines changed: 48 additions & 262 deletions
Loading

doc/source/getting_started/intro_tutorials/01_table_oriented.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,8 @@ SQL table or the ``data.frame`` in `R <https://www.r-project.org/>`__.
7272
- The column ``Name`` consists of textual data with each value a
7373
string, the column ``Age`` are numbers and the column ``Sex`` is
7474
textual data.
75+
- The index labels each row. By default, this is a sequence of integers
76+
starting at 0.
7577

7678
In spreadsheet software, the table representation of our data would look
7779
very similar:

doc/source/reference/arrays.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -637,6 +637,8 @@ with a bool :class:`numpy.ndarray`.
637637
DatetimeTZDtype.tz
638638
PeriodDtype.freq
639639
IntervalDtype.subtype
640+
StringDtype.storage
641+
StringDtype.na_value
640642

641643
*********
642644
Utilities

doc/source/user_guide/indexing.rst

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -431,6 +431,36 @@ an error will be raised. For instance, in the above example, ``s.loc[2:5]`` woul
431431
For more information about duplicate labels, see
432432
:ref:`Duplicate Labels <duplicates>`.
433433

434+
When using a slice with a step, such as ``.loc[start:stop:step]``, note that
435+
*start* and *stop* are interpreted as **labels**, while *step* is applied over
436+
the **positional index** within that label range. This means a stepped slice
437+
will behave differently than using the labels ``range(start, stop, step)`` when
438+
the index is not contiguous integers.
439+
440+
For example, in a ``Series`` with a non-contiguous integer index:
441+
442+
.. ipython:: python
443+
444+
s = pd.Series(range(10), index=[0, 5, 10, 15, 20, 25, 30, 35, 40, 45])
445+
s.loc[10:50:5] # (10), then skip 3 positions → 35 only
446+
s.loc[[10, 15, 20, 25]] # explicit label selection
447+
448+
The first applies *step* across **positional locations** between the start/stop
449+
labels. The second selects each label directly.
450+
451+
Similarly, with a string-based index, the behavior is identical:
452+
453+
.. ipython:: python
454+
455+
s = pd.Series(range(10), index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
456+
s.loc['b':'i':2] # Start at 'b' (position 1), stop at 'i' (position 8), step 2 positions → 'b', 'd', 'f', 'h'
457+
s.loc[['b', 'd', 'f', 'h']] # explicit label selection
458+
459+
In both cases, *start* and *stop* determine the label boundaries (inclusive),
460+
while *step* skips positions within that range, regardless of the index type.
461+
462+
463+
434464
.. _indexing.integer:
435465

436466
Selection by position

doc/source/user_guide/migration-3-strings.rst

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,14 @@ enable it with:
2020
2121
This allows you to test your code before the final 3.0 release.
2222

23+
.. note::
24+
25+
This migration guide focuses on the changes and migration steps needed when
26+
you are currently using ``object`` dtype for string data, which is used by
27+
default in pandas < 3.0. If you are already using one of the opt-in string
28+
dtypes, you can continue to do so without change.
29+
See :ref:`string_migration_guide-for_existing_users` for more details.
30+
2331
Background
2432
----------
2533

@@ -457,7 +465,23 @@ raise an error regardless of the number of strings:
457465
...
458466
TypeError: Cannot perform reduction 'prod' with string dtype
459467
460-
.. For existing users of the nullable ``StringDtype``
461-
.. --------------------------------------------------
462468
463-
.. TODO
469+
.. _string_migration_guide-for_existing_users:
470+
471+
For existing users of the nullable ``StringDtype``
472+
--------------------------------------------------
473+
474+
While pandas 3.0 introduces a new _default_ string data type, pandas had an
475+
opt-in nullable string data type since pandas 1.0, which can be specified using
476+
``dtype="string"``. This nullable string dtype uses ``pd.NA`` as the missing
477+
value indicator. In addition, also through :class:`ArrowDtype` (by using
478+
``dtypes_backend="pyarrow"``) since pandas 1.5, one could already make use of
479+
a dedicated string dtype.
480+
481+
If you are already using one of the nullable string dtypes, for example by
482+
specifying ``dtype="string"``, by using :meth:`~DataFrame.convert_dtypes`, or
483+
by specifying the ``dtype_backend`` argument in IO functions, you can continue
484+
to do so without change.
485+
486+
The migration guide above applies to code that is currently (< 3.0) using object
487+
dtype for string data.

doc/source/user_guide/text.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -753,7 +753,10 @@ Differences in behavior will be primarily due to the kind of NA value.
753753
The four :class:`StringDtype` variants
754754
======================================
755755

756-
There are four :class:`StringDtype` variants that are available to users.
756+
There are four :class:`StringDtype` variants that are available to users,
757+
controlled by the ``storage`` and ``na_value`` parameters of :class:`StringDtype`.
758+
At runtime, these can be checked via the :attr:`StringDtype.storage`
759+
and :attr:`StringDtype.na_value` attributes.
757760

758761
Python storage with ``np.nan`` values
759762
-------------------------------------

doc/source/user_guide/timeseries.rst

Lines changed: 0 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1294,15 +1294,6 @@ frequencies. We will refer to these aliases as *offset aliases*.
12941294
"us", "microseconds"
12951295
"ns", "nanoseconds"
12961296

1297-
.. deprecated:: 2.2.0
1298-
1299-
Aliases ``H``, ``BH``, ``CBH``, ``T``, ``S``, ``L``, ``U``, and ``N``
1300-
are deprecated in favour of the aliases ``h``, ``bh``, ``cbh``,
1301-
``min``, ``s``, ``ms``, ``us``, and ``ns``.
1302-
1303-
Aliases ``Y``, ``M``, and ``Q`` are deprecated in favour of the aliases
1304-
``YE``, ``ME``, ``QE``.
1305-
13061297

13071298
.. note::
13081299

@@ -1358,11 +1349,6 @@ frequencies. We will refer to these aliases as *period aliases*.
13581349
"us", "microseconds"
13591350
"ns", "nanoseconds"
13601351

1361-
.. deprecated:: 2.2.0
1362-
1363-
Aliases ``H``, ``T``, ``S``, ``L``, ``U``, and ``N`` are deprecated in favour of the aliases
1364-
``h``, ``min``, ``s``, ``ms``, ``us``, and ``ns``.
1365-
13661352

13671353
Combining aliases
13681354
~~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)