@@ -77,6 +77,12 @@ and are meant for cleaning the feature space.
7777Controlled under-sampling techniques
7878------------------------------------
7979
80+ Controlled under-sampling techniques reduce the number of observations from the
81+ targeted classes to a number specified by the user.
82+
83+ Random under-sampling
84+ ^^^^^^^^^^^^^^^^^^^^^
85+
8086:class: `RandomUnderSampler ` is a fast and easy way to balance the data by
8187randomly selecting a subset of data for the targeted classes::
8288
@@ -91,9 +97,9 @@ randomly selecting a subset of data for the targeted classes::
9197 :scale: 60
9298 :align: center
9399
94- :class: `RandomUnderSampler ` allows to bootstrap the data by setting
95- ``replacement `` to ``True ``. The resampling with multiple classes is performed
96- by considering independently each targeted class ::
100+ :class: `RandomUnderSampler ` allows bootstrapping the data by setting
101+ ``replacement `` to ``True ``. When there are multiple classes, each targeted class is
102+ under-sampled independently::
97103
98104 >>> import numpy as np
99105 >>> print(np.vstack([tuple(row) for row in X_resampled]).shape)
@@ -103,8 +109,8 @@ by considering independently each targeted class::
103109 >>> print(np.vstack(np.unique([tuple(row) for row in X_resampled], axis=0)).shape)
104110 (181, 2)
105111
106- In addition, :class: `RandomUnderSampler ` allows to sample heterogeneous data
107- (e.g. containing some strings) ::
112+ :class: `RandomUnderSampler ` handles heterogeneous data types, i.e. numerical,
113+ categorical, dates, etc. ::
108114
109115 >>> X_hetero = np.array([['xxx', 1, 1.0], ['yyy', 2, 2.0], ['zzz', 3, 3.0]],
110116 ... dtype=object)
@@ -116,7 +122,8 @@ In addition, :class:`RandomUnderSampler` allows to sample heterogeneous data
116122 >>> print(y_resampled)
117123 [0 1]
118124
119- It would also work with pandas dataframe::
125+ :class: `RandomUnderSampler ` also supports pandas dataframes as input for
126+ undersampling::
120127
121128 >>> from sklearn.datasets import fetch_openml
122129 >>> df_adult, y_adult = fetch_openml(
0 commit comments