- Grades
- Submitting assignments
- No need to create new branches from here on out
- Final Projects
- Practical Computing, Chapters 10 and 11
In-Class Assignment
Today, we will start taking steps toward building code that can process lots of
data easily and can be reused in other scripts that we may write in the future.
The goal of this week's in-class assignment is to write a script that will translate
a nucleotide sequence into an amino-acid sequence (like you did last week). But
it will do so for an arbitrary number of sequences in an arbitrary number of files.
Your script should:
- Accept any number of filenames as command-line arguments
- Each of these files can contain a separate nucleotide sequence on each line
- The script should contain a new function that takes each of these sequences,
translates them to amino acids, and prints all of them to one output file.
-
Reading from files
- To read or write from a file, you'll first need to define the file name
- e.g.,
inFileName = "FileToRead.txt"
- e.g.,
- However, this is just a string variable with the file name. We need to create an object that can actually read the contents of a file.
- Python has a built-in function to create a file object -
open(<FILENAME>,<MODE>) - The
<FILENAME>argument is just a string with the file's name (or path to the file) - The
<MODE>argument tells Python whether we are reading from a file (r), writing to a file (w), or appending to a file (a). - To open up a new File object to read file contents, use syntax like this:
inFile = open(inFileName,'r')
- There are several useful methods associated with file objects, but one of the most commonly used is
readline(). This method will read lines one-by-one from the file. Note that the end of line character (\n) is retained when the line is read in.firstLine = inFile.readline()
- Files opened for reading can be used in a
forloop, as follows, to go through all the lines in the file: for line in file: print("Length of line is: %d" % (len(line))) - Note that
lineis just a variable name we've chosen to hold each line as we iterate through the file. You can use any variable name you choose, as with any otherforloop.
- To read or write from a file, you'll first need to define the file name
-
Writing to Files
- Writing to a file is very similar to reading from a file. First, you define an output file name
outFileName = "FileToWrite.txt"
- To create a file object to use for writing, we'll again use the
open()function, but we'll specify'w'for the<MODE>.outFile = open(outFileName,'w')
- To write to the file line-by-line, we can use the
.write()method.outFile.write("This is a new sentence.\n")- Note that the
write()method does NOT, by default, add a new line character to strings. If we want to end a line, we have to explicitly include\n.
- Writing to a file is very similar to reading from a file. First, you define an output file name
-
Command-line Arguments
- As with bash scripts, Python scripts can also take advantage of command-line arguments.
- To easily deal with command-line arguments, we're going to take advantage of some functions in the
syslibrary. So we'll need to start by importing that library:import sys
- Any command-line arguments we pass to a script can then be accessed using the
sys.argvvariable.print(sys.argv[2])- Which argument is printed when you run the line above? Does that make sense with the 0-based indexing in Python?
- We can also loop through all command-line arguments: for arg in sys.argv: print(arg)
- These abilities are very useful in a variety of contexts, but particularly when a set of filenames are provided as command-line arguments and you want to iteratively process each file.
-
Defining New Functions
-
Thankfully, we are not limited to only using the functions that Python has built-in
-
We can define our own functions to take care of repetitive tasks
-
As our scripts become more complicated it will become increasingly important to start "packaging" commands together. This will make our scripts more readable and our code more reusable.
-
The basic syntax for defining a function is really simple: def myNewFunc(anArgument): """Explain here what your function does""" print(anArgument)
-
The keyword
deftells Python that you are creating a new function. -
The name of any arguments that you provide in the parentheses can be accessed with that variable name inside the function.
-
The part in the quotes is known as the docstring. It documents what your function does. Running
help(myNewFunc)then allows anyone to see what your function is about. -
Each function can contain as many lines of code as you want/need, as long as they are all indented by the same amount.
-
Many functions end with
returnstatements. The point of the return statement is to make the value of a variable available to be saved or manipulated. Return statements, when included, are always the last line in a function definition. Here's an example: def factorial(num): product = 1 while (num > 0): product = product * num num = num - 1 return productmyNum = factorial(5)
-
-
Introduction to Plotting
- We will use the
matplotliblibrary for plotting graphs and figures in Python - You will need to install this library on your VM before you can load it. To install on Ubuntu, type
sudo apt-get install python3-matplotlib. - We will start by using one part of the
matplotliblibrary -pyplot. To import one part of a libary, you can use an import statement like this:from matplotlib import pyplot - If you want to use a different "nickname" for a library, you can indicate the name you want to use when you run the import command -
import matplotlib.pyplot as plt - Whenever you use
asin an import statement, you will be providing an alternative name to access the functions in that library. - To create a simple line plot based on two numerical vectors, you can use these commands: plt.plot([1,2,3,4],[66,67,68,69]) plt.show()
- We will use the
-
Introducing the
numpylibrarynumpyis a powerful library for Python that incorporates sophisticated mathematical tools- Today, we are going to use
numpyto draw numbers for a probability distribution - While we can draw some random numbers from simple distributions (like a uniform or Normal) using the built-in
randomlibrary, we often might to draw from other distributions - In just a moment, we will need to draw from a Poisson distribution. This is a probability distribution that is confined to integers greater than or equal to 0. It is often used to model the number of events that occur in time or space, when these events are independent and have a fixed probability.
- More on the Poisson can be found here.
- Specifically, the function to draw from this distribution is
numpy.random.poisson() - For now, let's import just the submodule called
numpy.randomand to avoid having to write the full name out every time, let's give it a nickname -import numpy.random as nr. We can then call the Poisson function asnr.poisson(). numpymight already be on your system after you installmatplotlib, but if not you should be able to install it using this Terminal command:sudo apt-get install python-numpy.
-
Outlining a program with pseudocode
- You now have a lot of tools at your disposal to tackle various challenges in Python.
- Perhaps the most important skill you can now practice is how to think through solving a problem with code.
- There's no single correct way to do this, but a general strategy that's often used is to write out pseudocode.
- Basically, think through all the steps that you'll need, but don't worry at all about the language syntax.
- What I like to do is open up a file like I'm about to write a script, but then start by just writing all the steps as a series of comments.
- After you've worked out what you want to happen, you can fill in the commands below the comments.
- To practice this, I want you to now write out pseudocode to conduct a simulation of population growth.
- Your simulation should involve starting with some number of individuals.
- Each of those individuals should have the same capacity for reproduction, but they will vary (randomly) in how many offspring they produce. This is where the Poisson distribution comes in.
- The resources available to this population are limited, though, so it can't go above a certain size, known as the carrying capacity.
- Simulate reproduction and population size changes for some number of generations. This number should be able to be easily changed by changing the value of a variable.
- Write out in
# commentshow you plan to conduct this simulation.
- Write out the actual code for the simulation.
- Run it several times, with different starting population sizes, carrying capacities, and numbers of generations.
- Write a short summary of what happens