Skip to content

Commit 08dde6c

Browse files
committed
added clean commit
1 parent 5696fd3 commit 08dde6c

File tree

1 file changed

+74
-5
lines changed

1 file changed

+74
-5
lines changed

doc/source/user_guide/io.rst

Lines changed: 74 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,33 @@
1-
#io.rst
21
.. _io:
32

3+
44
.. currentmodule:: pandas
55

66

7+
78
===============================
89
IO tools (text, CSV, HDF5, ...)
910
===============================
1011

12+
1113
The pandas I/O API is a set of top level ``reader`` functions accessed like
1214
:func:`pandas.read_csv` that generally return a pandas object. The corresponding
1315
``writer`` functions are object methods that are accessed like
1416
:meth:`DataFrame.to_csv`. Below is a table containing available ``readers`` and
1517
``writers``.
1618

19+
1720
.. csv-table::
1821
:header: "Format Type", "Data Description", "Reader", "Writer"
1922
:widths: 30, 100, 60, 60
2023

24+
2125
text,`CSV <https://en.wikipedia.org/wiki/Comma-separated_values>`__, :ref:`read_csv<io.read_csv_table>`, :ref:`to_csv<io.store_in_csv>`
2226
text,Fixed-Width Text File, :ref:`read_fwf<io.fwf_reader>`, NA
2327
text,`JSON <https://www.json.org/>`__, :ref:`read_json<io.json_reader>`, :ref:`to_json<io.json_writer>`
24-
text,`HTML <https://en.wikipedia.org/wiki/HTML>`__, :ref:`read_html<io.read_html>`, :ref:`to_html<io.html>`
28+
text,`HTML <https://en.wikipedia.org/wiki/HTML>`__, :ref:`read_html<io.html>`, :ref:`to_html<io.html>`
2529
text,`LaTeX <https://en.wikipedia.org/wiki/LaTeX>`__, NA, :ref:`Styler.to_latex<io.latex>`
26-
text,`XML <https://www.w3.org/standards/xml/core>`__, :ref:`read_xml<io.read_xml>`, :ref:`to_xml<io.xml>`
30+
text,`XML <https://www.w3.org/standards/xml/core>`__, :ref:`read_xml<io.xml>`, :ref:`to_xml<io.xml>`
2731
text, Local clipboard, :ref:`read_clipboard<io.clipboard>`, :ref:`to_clipboard<io.clipboard>`
2832
binary,`MS Excel <https://en.wikipedia.org/wiki/Microsoft_Excel>`__ , :ref:`read_excel<io.excel_reader>`, :ref:`to_excel<io.excel_writer>`
2933
binary,`OpenDocument <http://opendocumentformat.org>`__, :ref:`read_excel<io.ods>`, NA
@@ -38,28 +42,73 @@ The pandas I/O API is a set of top level ``reader`` functions accessed like
3842
binary,`Python Pickle Format <https://docs.python.org/3/library/pickle.html>`__, :ref:`read_pickle<io.pickle>`, :ref:`to_pickle<io.pickle>`
3943
SQL,`SQL <https://en.wikipedia.org/wiki/SQL>`__, :ref:`read_sql<io.sql>`,:ref:`to_sql<io.sql>`
4044

45+
46+
.. _io.google_colab:
47+
48+
Google Colab
49+
^^^^^^^^^^^^
50+
51+
Google Colab provides several ways to load data into pandas DataFrames:
52+
53+
**Upload files directly**
54+
.. code-block:: python
55+
56+
from google.colab import files
57+
uploaded = files.upload()
58+
df = pd.read_csv('your_file.csv')
59+
60+
**Mount Google Drive**
61+
.. code-block:: python
62+
63+
from google.colab import drive
64+
drive.mount('/content/drive')
65+
df = pd.read_csv('/content/drive/MyDrive/your_file.csv')
66+
67+
**URLs work normally**
68+
.. code-block:: python
69+
70+
df = pd.read_csv('https://example.com/data.csv')
71+
72+
**Save/download**
73+
.. code-block:: python
74+
75+
df.to_csv('/content/drive/MyDrive/output.csv', index=False)
76+
files.download('output.csv')
77+
78+
Files in `/content/` are temporary. Use Drive for persistence.
79+
80+
See Colab's `official IO notebook <https://colab.research.google.com/notebooks/io.ipynb>`_.
81+
4182
:ref:`Here <io.perf>` is an informal performance comparison for some of these IO methods.
4283

84+
4385
.. note::
4486
For examples that use the ``StringIO`` class, make sure you import it
4587
with ``from io import StringIO`` for Python 3.
4688

89+
4790
.. _io.read_csv_table:
4891

92+
4993
CSV & text files
5094
----------------
5195

96+
5297
The workhorse function for reading text files (a.k.a. flat files) is
5398
:func:`read_csv`. See the :ref:`cookbook<cookbook.csv>` for some advanced strategies.
5499

100+
55101
Parsing options
56-
'''''''''''''''
102+
***************
103+
57104

58105
:func:`read_csv` accepts the following common arguments:
59106

107+
60108
Basic
61109
+++++
62110

111+
63112
filepath_or_buffer : various
64113
Either a path to a file (a :class:`python:str`, :class:`python:pathlib.Path`)
65114
URL (including http, ftp, and S3
@@ -76,9 +125,11 @@ sep : str, defaults to ``','`` for :func:`read_csv`, ``\t`` for :func:`read_tabl
76125
delimiter : str, default ``None``
77126
Alternative argument name for sep.
78127

128+
79129
Column and index locations and names
80130
++++++++++++++++++++++++++++++++++++
81131

132+
82133
header : int or list of ints, default ``'infer'``
83134
Row number(s) to use as the column names, and the start of the
84135
data. Default behavior is to infer the column names: if no names are
@@ -88,6 +139,7 @@ header : int or list of ints, default ``'infer'``
88139
``header=None``. Explicitly pass ``header=0`` to be able to replace
89140
existing names.
90141

142+
91143
The header can be a list of ints that specify row locations
92144
for a MultiIndex on the columns e.g. ``[0,1,3]``. Intervening rows
93145
that are not specified will be skipped (e.g. 2 in this example is
@@ -102,21 +154,25 @@ index_col : int, str, sequence of int / str, or False, optional, default ``None`
102154
string name or column index. If a sequence of int / str is given, a
103155
MultiIndex is used.
104156

157+
105158
.. note::
106159
``index_col=False`` can be used to force pandas to *not* use the first
107160
column as the index, e.g. when you have a malformed file with delimiters at
108161
the end of each line.
109162

163+
110164
The default value of ``None`` instructs pandas to guess. If the number of
111165
fields in the column header row is equal to the number of fields in the body
112166
of the data file, then a default index is used. If it is larger, then
113167
the first columns are used as index so that the remaining number of fields in
114168
the body are equal to the number of fields in the header.
115169

170+
116171
The first row after the header is used to determine the number of columns,
117172
which will go into the index. If the subsequent rows contain less columns
118173
than the first row, they are filled with ``NaN``.
119174

175+
120176
This can be avoided through ``usecols``. This ensures that the columns are
121177
taken as is and the trailing data are ignored.
122178
usecols : list-like or callable, default ``None``
@@ -127,32 +183,40 @@ usecols : list-like or callable, default ``None``
127183
header row(s) are not taken into account. For example, a valid list-like
128184
``usecols`` parameter would be ``[0, 1, 2]`` or ``['foo', 'bar', 'baz']``.
129185

186+
130187
Element order is ignored, so ``usecols=[0, 1]`` is the same as ``[1, 0]``. To
131188
instantiate a DataFrame from ``data`` with element order preserved use
132189
``pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']]`` for columns
133190
in ``['foo', 'bar']`` order or
134191
``pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']]`` for
135192
``['bar', 'foo']`` order.
136193

194+
137195
If callable, the callable function will be evaluated against the column names,
138196
returning names where the callable function evaluates to True:
139197

198+
140199
.. ipython:: python
141200
201+
142202
import pandas as pd
143203
from io import StringIO
144204
205+
145206
data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"
146207
pd.read_csv(StringIO(data))
147208
pd.read_csv(StringIO(data), usecols=lambda x: x.upper() in ["COL1", "COL3"])
148209
210+
149211
Using this parameter results in much faster parsing time and lower memory usage
150212
when using the c engine. The Python engine loads the data first before deciding
151213
which columns to drop.
152214

215+
153216
General parsing configuration
154217
+++++++++++++++++++++++++++++
155218

219+
156220
dtype : Type name or dict of column -> type, default ``None``
157221
Data type for data or columns. E.g. ``{'a': np.float64, 'b': np.int32, 'c': 'Int64'}``
158222
Use ``str`` or ``object`` together with suitable ``na_values`` settings to preserve
@@ -161,16 +225,20 @@ dtype : Type name or dict of column -> type, default ``None``
161225
the default determines the dtype of the columns which are not explicitly
162226
listed.
163227

228+
164229
dtype_backend : {"numpy_nullable", "pyarrow"}, defaults to NumPy backed DataFrames
165230
Which dtype_backend to use, e.g. whether a DataFrame should have NumPy
166231
arrays, nullable dtypes are used for all dtypes that have a nullable
167232
implementation when "numpy_nullable" is set, pyarrow is used for all
168233
dtypes if "pyarrow" is set.
169234

235+
170236
The dtype_backends are still experimental.
171237

238+
172239
.. versionadded:: 2.0
173240

241+
174242
engine : {``'c'``, ``'python'``, ``'pyarrow'``}
175243
Parser engine to use. The C and pyarrow engines are faster, while the python engine
176244
is currently more feature-complete. Multithreading is currently only supported by
@@ -183,7 +251,8 @@ true_values : list, default ``None``
183251
Values to consider as ``True``.
184252
false_values : list, default ``None``
185253
Values to consider as ``False``.
186-
skipinitialspace : boolean, default ``False``
254+
skipinitialspace : boolean,
255+
default ``False``
187256
Skip spaces after delimiter.
188257
skiprows : list-like or integer, default ``None``
189258
Line numbers to skip (0-indexed) or number of lines to skip (int) at the start

0 commit comments

Comments
 (0)