Merge pull request #11 from oscarsiles/2025-meeting-lesson

arm61 · web-flow · commit f8979f86e49a · 2025-04-14T09:06:54.000+01:00
first draft - reading files notebook
diff --git a/notebooks/elements.csv b/notebooks/elements.csv
@@ -0,0 +1,4 @@
+Name,Symbol,Number
+Hydrogen,H,1
+Helium,He,2
+Lithium,Li,3
diff --git a/notebooks/molecule.txt b/notebooks/molecule.txt
@@ -0,0 +1 @@
+C2H6
diff --git a/notebooks/reading_files.ipynb b/notebooks/reading_files.ipynb
@@ -0,0 +1,289 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "1767a58b",
+   "metadata": {},
+   "source": [
+    "# Prerequisites:\n",
+    "- Variables\n",
+    "- Iterables(?)\n",
+    "- Loops\n",
+    "\n",
+    "# Learning Outcomes:\n",
+    "- Open files using Python's built-in functions and extract their contents to variables\n",
+    "- Use the CSV module to read data from CSV files"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4882898",
+   "metadata": {},
+   "source": [
+    "# **Reading Files**\n",
+    "\n",
+    "One of the common uses of Python in chemistry is to analyse large amounts of data. \n",
+    "This might be data gathered during an experiment that has been stored in a number of files, and Python has a number of built-in functions to read (and write) files. \n",
+    "In this section, we will explore how to read different types of files, including text files and CSV files, using Python's built-in capabilities.\n",
+    "\n",
+    "Let's start with a opening a simple text file and reading its contents:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0ff6944a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "file = open('molecule.txt', 'r')\n",
+    "contents = file.read()\n",
+    "file.close()\n",
+    "print(contents)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6d821f38",
+   "metadata": {},
+   "source": [
+    "After running the cell above, you should see the contents of the `molecule.txt` file in the cell output. \n",
+    "If you don't see the output, make sure that the file is in the same directory as this notebook. \n",
+    "You can also verify the output by checking the file's contents in a text editor.\n",
+    "\n",
+    "The first line of the code cell above opens the file `molecule.txt` using the `open()` function and saves it to a special file-reading Python *object* we have called `file`.\n",
+    "The `open()` function takes at least one argument which is either the file name (if in the same working directory) or the full filepath of the file.\n",
+    "It can also take a second argument to specify the mode in which the file is opened (e.g., `'r'` for reading, `'w'` for writing, etc.).\n",
+    "If you don't specify a mode, the file is opened in read mode by default.\n",
+    "\n",
+    "The second line of the code cell reads the entire contents of the file using the `read()` method of the file object and stores it in a variable called `contents`. \n",
+    "\n",
+    "The third line closes the file using the `close()` method and is considered good practice.\n",
+    "Otherwise we might leave it open, which can lead to various issues (e.g., file access errors).\n",
+    "\n",
+    "Finally, the last line prints the contents of the `contents` variable."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "900f642e",
+   "metadata": {},
+   "source": [
+    "### Reading Files with `with`\n",
+    "We can also use the `with` statement to open files, which will automatically close the file for us when we are done with it.\n",
+    "This is a more \"Pythonic\" way to handle files and is generally recommended.\n",
+    "\n",
+    "Let's take a look at the same example using the `with` statement:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f63f3d19",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('molecule.txt', 'r') as file:\n",
+    "    contents = file.read()\n",
+    "\n",
+    "print(contents)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "06bbb57c",
+   "metadata": {},
+   "source": [
+    "As before, we open the `molecule.txt` file and read its contents.\n",
+    "The difference is that we use the `with` statement to open the file, which automatically closes it when we are done with it (i.e., when we exit the `with` block).\n",
+    "\n",
+    "We now have a way to read files in Python, and use their contents as *variables* in our code."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ec1d24a",
+   "metadata": {},
+   "source": [
+    "## Reading CSV Files\n",
+    "CSV (Comma Separated Values) files are a common format for storing tabular data, such as data from experiments or simulations.\n",
+    "Each line in a CSV file represents a row of data, and each value in the row is separated by a comma (you can easily verify this by opening up a CSV file in a text editor).\n",
+    "Python has a built-in module called `csv` that makes it easy to read (and write) CSV files.\n",
+    "\n",
+    "Let's take a look at how to read a CSV file using the `csv` module:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3ca51d4d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import csv\n",
+    "\n",
+    "with open('elements.csv') as file:\n",
+    "    csv_reader = csv.reader(file)\n",
+    "    for row in csv_reader:\n",
+    "        print(row)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ae13696",
+   "metadata": {},
+   "source": [
+    "Here, we first import the built-in `csv` module to allow us to easily parse CSV files.\n",
+    "\n",
+    "Next we open the `elements.csv` file using the `with` statement as we have seen before.\n",
+    "Note that we are opening the file in read mode without needing to specify it explicitly.\n",
+    "\n",
+    "The `csv.reader()` function takes the file object as an argument and returns a CSV reader object that can be used to *iterate* over the rows in the CSV file.\n",
+    "\n",
+    "Finally, we use a `for` loop to iterate over the rows in the CSV file and print the contents of each row.\n",
+    "The csv_reader object allows us to access each row as a list of values, making it easy to work with the data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "760dcb9a",
+   "metadata": {},
+   "source": [
+    "## Exercises\n",
+    "\n",
+    "### Manipulate data\n",
+    "Use f-strings to print the contents of the `elements.csv` file in a more readable format.\n",
+    "Don't forget about the header row!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "53a6fb7d",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "633a2836",
+   "metadata": {},
+   "source": [
+    "Example answer (skipping the header entirely):\n",
+    "```python\n",
+    "import csv\n",
+    "\n",
+    "with open('elements.csv') as csvfile:\n",
+    "    csv_reader = csv.reader(csvfile)\n",
+    "    next(csv_reader)  # Skip the header row\n",
+    "    for row in csv_reader:\n",
+    "        print(f\"Name: {row[0]}, Symbol: {row[1]}, Atomic Number: {row[2]}\")\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c67c1875",
+   "metadata": {},
+   "source": [
+    "### Using the file path\n",
+    "Try to open a file that is not in the same directory as this notebook and print its contents."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "de2abab4",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3430cc73",
+   "metadata": {},
+   "source": [
+    "TODO: Example answer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "10c5379d",
+   "metadata": {},
+   "source": [
+    "### Loop through multiple files\n",
+    "TODO: Task involving looping through multiple files with a predictable filename (e.g. `001.csv`) and reading their contents."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "002dbb28",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1114d99",
+   "metadata": {},
+   "source": [
+    "TODO: Example answer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "619f5799",
+   "metadata": {},
+   "source": [
+    "## Debugging\n",
+    "The code below contains a bug and will not run.\n",
+    "See if you can fix it by reading the error message and using the information it provides."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "818250af",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('molecule.csv', 'r') as file:\n",
+    "    text = file.read()\n",
+    "\n",
+    "print(text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f58d91db",
+   "metadata": {},
+   "source": [
+    "## TODO\n",
+    "- Discuss carriage returns and other special characters?\n",
+    "- Explain the distinction between text and binary files?"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}