Skip to content

Commit f8979f8

Browse files
authored
Merge pull request #11 from oscarsiles/2025-meeting-lesson
first draft - reading files notebook
2 parents 21c000d + 3881082 commit f8979f8

File tree

3 files changed

+294
-0
lines changed

3 files changed

+294
-0
lines changed

notebooks/elements.csv

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Name,Symbol,Number
2+
Hydrogen,H,1
3+
Helium,He,2
4+
Lithium,Li,3

notebooks/molecule.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
C2H6

notebooks/reading_files.ipynb

Lines changed: 289 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "1767a58b",
6+
"metadata": {},
7+
"source": [
8+
"# Prerequisites:\n",
9+
"- Variables\n",
10+
"- Iterables(?)\n",
11+
"- Loops\n",
12+
"\n",
13+
"# Learning Outcomes:\n",
14+
"- Open files using Python's built-in functions and extract their contents to variables\n",
15+
"- Use the CSV module to read data from CSV files"
16+
]
17+
},
18+
{
19+
"cell_type": "markdown",
20+
"id": "f4882898",
21+
"metadata": {},
22+
"source": [
23+
"# **Reading Files**\n",
24+
"\n",
25+
"One of the common uses of Python in chemistry is to analyse large amounts of data. \n",
26+
"This might be data gathered during an experiment that has been stored in a number of files, and Python has a number of built-in functions to read (and write) files. \n",
27+
"In this section, we will explore how to read different types of files, including text files and CSV files, using Python's built-in capabilities.\n",
28+
"\n",
29+
"Let's start with a opening a simple text file and reading its contents:"
30+
]
31+
},
32+
{
33+
"cell_type": "code",
34+
"execution_count": null,
35+
"id": "0ff6944a",
36+
"metadata": {},
37+
"outputs": [],
38+
"source": [
39+
"file = open('molecule.txt', 'r')\n",
40+
"contents = file.read()\n",
41+
"file.close()\n",
42+
"print(contents)"
43+
]
44+
},
45+
{
46+
"cell_type": "markdown",
47+
"id": "6d821f38",
48+
"metadata": {},
49+
"source": [
50+
"After running the cell above, you should see the contents of the `molecule.txt` file in the cell output. \n",
51+
"If you don't see the output, make sure that the file is in the same directory as this notebook. \n",
52+
"You can also verify the output by checking the file's contents in a text editor.\n",
53+
"\n",
54+
"The first line of the code cell above opens the file `molecule.txt` using the `open()` function and saves it to a special file-reading Python *object* we have called `file`.\n",
55+
"The `open()` function takes at least one argument which is either the file name (if in the same working directory) or the full filepath of the file.\n",
56+
"It can also take a second argument to specify the mode in which the file is opened (e.g., `'r'` for reading, `'w'` for writing, etc.).\n",
57+
"If you don't specify a mode, the file is opened in read mode by default.\n",
58+
"\n",
59+
"The second line of the code cell reads the entire contents of the file using the `read()` method of the file object and stores it in a variable called `contents`. \n",
60+
"\n",
61+
"The third line closes the file using the `close()` method and is considered good practice.\n",
62+
"Otherwise we might leave it open, which can lead to various issues (e.g., file access errors).\n",
63+
"\n",
64+
"Finally, the last line prints the contents of the `contents` variable."
65+
]
66+
},
67+
{
68+
"cell_type": "markdown",
69+
"id": "900f642e",
70+
"metadata": {},
71+
"source": [
72+
"### Reading Files with `with`\n",
73+
"We can also use the `with` statement to open files, which will automatically close the file for us when we are done with it.\n",
74+
"This is a more \"Pythonic\" way to handle files and is generally recommended.\n",
75+
"\n",
76+
"Let's take a look at the same example using the `with` statement:"
77+
]
78+
},
79+
{
80+
"cell_type": "code",
81+
"execution_count": null,
82+
"id": "f63f3d19",
83+
"metadata": {},
84+
"outputs": [],
85+
"source": [
86+
"with open('molecule.txt', 'r') as file:\n",
87+
" contents = file.read()\n",
88+
"\n",
89+
"print(contents)"
90+
]
91+
},
92+
{
93+
"cell_type": "markdown",
94+
"id": "06bbb57c",
95+
"metadata": {},
96+
"source": [
97+
"As before, we open the `molecule.txt` file and read its contents.\n",
98+
"The difference is that we use the `with` statement to open the file, which automatically closes it when we are done with it (i.e., when we exit the `with` block).\n",
99+
"\n",
100+
"We now have a way to read files in Python, and use their contents as *variables* in our code."
101+
]
102+
},
103+
{
104+
"cell_type": "markdown",
105+
"id": "8ec1d24a",
106+
"metadata": {},
107+
"source": [
108+
"## Reading CSV Files\n",
109+
"CSV (Comma Separated Values) files are a common format for storing tabular data, such as data from experiments or simulations.\n",
110+
"Each line in a CSV file represents a row of data, and each value in the row is separated by a comma (you can easily verify this by opening up a CSV file in a text editor).\n",
111+
"Python has a built-in module called `csv` that makes it easy to read (and write) CSV files.\n",
112+
"\n",
113+
"Let's take a look at how to read a CSV file using the `csv` module:"
114+
]
115+
},
116+
{
117+
"cell_type": "code",
118+
"execution_count": null,
119+
"id": "3ca51d4d",
120+
"metadata": {},
121+
"outputs": [],
122+
"source": [
123+
"import csv\n",
124+
"\n",
125+
"with open('elements.csv') as file:\n",
126+
" csv_reader = csv.reader(file)\n",
127+
" for row in csv_reader:\n",
128+
" print(row)"
129+
]
130+
},
131+
{
132+
"cell_type": "markdown",
133+
"id": "7ae13696",
134+
"metadata": {},
135+
"source": [
136+
"Here, we first import the built-in `csv` module to allow us to easily parse CSV files.\n",
137+
"\n",
138+
"Next we open the `elements.csv` file using the `with` statement as we have seen before.\n",
139+
"Note that we are opening the file in read mode without needing to specify it explicitly.\n",
140+
"\n",
141+
"The `csv.reader()` function takes the file object as an argument and returns a CSV reader object that can be used to *iterate* over the rows in the CSV file.\n",
142+
"\n",
143+
"Finally, we use a `for` loop to iterate over the rows in the CSV file and print the contents of each row.\n",
144+
"The csv_reader object allows us to access each row as a list of values, making it easy to work with the data."
145+
]
146+
},
147+
{
148+
"cell_type": "markdown",
149+
"id": "760dcb9a",
150+
"metadata": {},
151+
"source": [
152+
"## Exercises\n",
153+
"\n",
154+
"### Manipulate data\n",
155+
"Use f-strings to print the contents of the `elements.csv` file in a more readable format.\n",
156+
"Don't forget about the header row!"
157+
]
158+
},
159+
{
160+
"cell_type": "code",
161+
"execution_count": null,
162+
"id": "53a6fb7d",
163+
"metadata": {},
164+
"outputs": [],
165+
"source": []
166+
},
167+
{
168+
"cell_type": "markdown",
169+
"id": "633a2836",
170+
"metadata": {},
171+
"source": [
172+
"Example answer (skipping the header entirely):\n",
173+
"```python\n",
174+
"import csv\n",
175+
"\n",
176+
"with open('elements.csv') as csvfile:\n",
177+
" csv_reader = csv.reader(csvfile)\n",
178+
" next(csv_reader) # Skip the header row\n",
179+
" for row in csv_reader:\n",
180+
" print(f\"Name: {row[0]}, Symbol: {row[1]}, Atomic Number: {row[2]}\")\n",
181+
"```"
182+
]
183+
},
184+
{
185+
"cell_type": "markdown",
186+
"id": "c67c1875",
187+
"metadata": {},
188+
"source": [
189+
"### Using the file path\n",
190+
"Try to open a file that is not in the same directory as this notebook and print its contents."
191+
]
192+
},
193+
{
194+
"cell_type": "code",
195+
"execution_count": null,
196+
"id": "de2abab4",
197+
"metadata": {},
198+
"outputs": [],
199+
"source": []
200+
},
201+
{
202+
"cell_type": "markdown",
203+
"id": "3430cc73",
204+
"metadata": {},
205+
"source": [
206+
"TODO: Example answer"
207+
]
208+
},
209+
{
210+
"cell_type": "markdown",
211+
"id": "10c5379d",
212+
"metadata": {},
213+
"source": [
214+
"### Loop through multiple files\n",
215+
"TODO: Task involving looping through multiple files with a predictable filename (e.g. `001.csv`) and reading their contents."
216+
]
217+
},
218+
{
219+
"cell_type": "code",
220+
"execution_count": null,
221+
"id": "002dbb28",
222+
"metadata": {},
223+
"outputs": [],
224+
"source": []
225+
},
226+
{
227+
"cell_type": "markdown",
228+
"id": "c1114d99",
229+
"metadata": {},
230+
"source": [
231+
"TODO: Example answer"
232+
]
233+
},
234+
{
235+
"cell_type": "markdown",
236+
"id": "619f5799",
237+
"metadata": {},
238+
"source": [
239+
"## Debugging\n",
240+
"The code below contains a bug and will not run.\n",
241+
"See if you can fix it by reading the error message and using the information it provides."
242+
]
243+
},
244+
{
245+
"cell_type": "code",
246+
"execution_count": null,
247+
"id": "818250af",
248+
"metadata": {},
249+
"outputs": [],
250+
"source": [
251+
"with open('molecule.csv', 'r') as file:\n",
252+
" text = file.read()\n",
253+
"\n",
254+
"print(text)"
255+
]
256+
},
257+
{
258+
"cell_type": "markdown",
259+
"id": "f58d91db",
260+
"metadata": {},
261+
"source": [
262+
"## TODO\n",
263+
"- Discuss carriage returns and other special characters?\n",
264+
"- Explain the distinction between text and binary files?"
265+
]
266+
}
267+
],
268+
"metadata": {
269+
"kernelspec": {
270+
"display_name": "Python 3",
271+
"language": "python",
272+
"name": "python3"
273+
},
274+
"language_info": {
275+
"codemirror_mode": {
276+
"name": "ipython",
277+
"version": 3
278+
},
279+
"file_extension": ".py",
280+
"mimetype": "text/x-python",
281+
"name": "python",
282+
"nbconvert_exporter": "python",
283+
"pygments_lexer": "ipython3",
284+
"version": "3.13.2"
285+
}
286+
},
287+
"nbformat": 4,
288+
"nbformat_minor": 5
289+
}

0 commit comments

Comments
 (0)