In this very first chapter, you will start a journey, swimming in the ocean of codes and data. During the following months, you may experience a staggering start, enjoyable progress or even deeply frustration. You have to step out your comfort zone, learning from each other and conquer the overwhelming information world with your persistence and intelligence. If you have the determination to accept this challenge, you will see a brand new yourself at the end of the course.
- Setup the Python environment. Please read here.
- Learn what is terminal. Be able to navigate file system in Terminal using the shell.
- Create the first python script and execute it.
- Learn the interactive mode of Python interpreter -- convenient for rapid experimentation.
- Learn the Jupyter notebook -- our major working platform in following weeks.
Basically, Terminal is a shell which receives/sends input and output for command-line program.
The function of the terminal is very powerful, and all the basic operations of the system can be completed in terminal, such as modifying file permissions, hiding/displaying files, and so on.
- Many features of computers are not available in the graphical interface, only through the command line.
- Work can be more efficient with command-line scripts.
- press command+space to open spotlight
- search "terminal" to open terminal.
When you open terminal, you can see these two lines. The first line represents the last time you log in. And the second line xuyucandeMacbook-pro shows your computer model. ~xuyucan is your account/username. What you need to focus is notation $ . Its the sign that you can input some commands, and after you click the return, the computer will send back the results in next line.
Following are some elementary commands you should know in terminal.

Please type often pwd and ls to know where you are. And use cd to the change the location.
-
lsmeans listing, showing the files in current folder. e.g.:~ xuyucan$ ls Applications Library chromedriver Calibre 书库 Movies pandas Creative Cloud Files Music python-twitter Desktop Pictures venv Documents Public Downloads PycharmProjects -
cdxx means change directory to xx, or change to xx folder, and you can add the location aftercd. -
pwdmeans print what directory, or show where you are e.g.:~ xuyucan$ cd Desktop desktop xuyucan$ pwd /Users/xuyucan/desktop
e.g.2: try to
cdto a certain folder in your desktop. As for me, I have created a folder named python in my desktop within several .py files, and Icdto this folder and list the file in this folder.desktop xuyucan$ cd python python xuyucan$ ls 17426316.py hello.py scrabe 2.py 2_2.py homework.py scrabe.py Case1_Advanced.py homework2.py sight.py Case1_Fundamental.py imdb.py taiwan-comments.py H1.py list.py taiwan_earthquake.csv comments.csv list2.py taiwanearthquake.py -
Besides following
cdwith a explicit path, one can use those special notation as a shortcut to change-directory to some common locations:cd ~orcdto return to home.cd .to go to current diretory.cd ..to return back to the upper directory.python xuyucan$ cd .. desktop xuyucan$ cd ~ ~ xuyucan$
Space separates the arguments and commands. So be careful. You can ls to check the new file or folder. (Make sure it exists.)
touchmeans to create a filermmeans to delete a filemkdirmeans to create a folderrmdirmeans to delete a foldermvmeans to rename
e.g.: At first, cd to desktop and create a new folder big_data
~ xuyucan$ cd desktop
desktop xuyucan$ mkdir big_data
you can see the new folder in your desktop
then cd to big_data folder, create a new python file ex1.py, rename it as exercise.py, delete this file, and delete the big_data directory. During the process, you can check out whether the file/folder changed.
desktop xuyucan$ cd big_data
big_data xuyucan$ touch ex1.py
big_data xuyucan$ mv ex1.py exercise.py
big_data xuyucan$ rm exercise.py
big_data xuyucan$ cd ..
desktop xuyucan$ rmdir big_dataman - format and display the on-line manual pages, its helpful to get help and explanation from the official manual. It can be used to display manual pages, search for occurrences of specific text, and other useful functions.
If you specify section, man only looks in that section of the manual. name is normally the name of the manual page, which is typically the name of a command, function, or file.
Enter the manual page of any shell command:
man {command-name}Frequent hotkeys in manual page window:
- Use
jandkto move the page down/ up by one line. - Use
ctrl+dandctrl+uto move down/ up by one-half screen. - Use
qto exit (return to shell prompt).
Following are several common use and examples:
- Display the usage of commands, like
ls
$ man ls
LS(1) BSD General Commands Manual LS(1)
NAME
ls -- list directory contents
SYNOPSIS
ls [-ABCFGHLOPRSTUW@abcdefghiklmnopqrstuwx1] [file ...]
DESCRIPTION
For each operand that names a file of a type other than directory, ls
displays its name as well as any requested, associated information. For
each operand that names a file of type directory, ls displays the names
of files contained within that directory, as well as any requested, asso-
ciated information.
If no operands are given, the contents of the current directory are dis-
played. If more than one operand is given, non-directory operands are
displayed first; directory and non-directory operands are sorted sepa-
rately and in lexicographical order.
The following options are available:
-@ Display extended attribute keys and sizes in long (-l) output.
-1 (The numeric digit ``one''.) Force output to be one entry per
line. This is the default when output is not to a terminal.
...- List all chapters and their file path:
-aw
$ man -aw ls
/usr/share/man/man1/ls.1
$ man -aw printf
/usr/share/man/man1/printf.1
/usr/share/man/man3/printf.3- Display the certain section of
printf: man + int + printf
$ man 3 printf
PRINTF(3) BSD Library Functions Manual PRINTF(3)
NAME
printf, fprintf, sprintf, snprintf, asprintf, dprintf, vprintf, vfprintf,
vsprintf, vsnprintf, vasprintf, vdprintf -- formatted output conversion
LIBRARY
Standard C Library (libc, -lc)
SYNOPSIS
#include <stdio.h>
int
printf(const char * restrict format, ...);
int
fprintf(FILE * restrict stream, const char * restrict format, ...);
...- Display all the section of
printf:-a
man -a printf- Search online manuals that contain the keyword:
-k
man -k printf- Search for files, whose name contain specified keywords:
-f
man -k printFor more functions, you can type man man on terminal to see more.
- Return value: it is a convention for UNIX-like system/ program to return a value upon completion of execution. The return value indicates whether the program executes as expected. Usually, the return value is
0, meaning the execution is successful. If the return value is non-zero, it means an error ocurred and you should go check the error code with the mannual. One can use this command to check the return value of the very last commandecho $?. - Output: a command/ program can output information for the user. There are two output streams:
stdout-- "standard output" -- This is usually useful data for further processing, e.g. as input to next command, for the users to comprehend. Later of this section, you will see the first Python coding usingprint(). Thisprint()basically writes texts tostdout.stderr-- "standard error" -- This includes error information that can help the user to debug. By default, when you operate in a MAC Terminal,stderrandstdoutare written to the same stream, so you can not distinguish them by eyesight. Interested readers can check this discussion to see how to divert the two streams.
Terminal is an interactive environment. The advantage of writing code inside is that you can get the result instantly, but the weakness is that you can't save it. When you want to run it again, you have to tap it again. This is why we need a text editor. In actual programming development, we always use a text editor to write the code and save it as a file, so that the computer can run repeatedly. We recommend two text editors, sublime and visual studio code(click to download). Then you can edit a .py file by these editors by double clicking the file. MAC will open "TextEdit" by default editor. You can set one of those two editors as default editor if necessary.
Python is a popular programming language that is widely used by beginners and longtime developers alike. Meanwhile, its the language that we learn in this course to scrape, clean, analyze, and visualize data. And there are basically 2 main versions of python. Python 2 and 3. You can check the version using following command on your terminal:
python --versionIn this course, we base our discussions and exercises on Python 3, you can check out the difference between python 2 and 3 and the instruction for installation of python 3 in related materials in our gitbook (if you have already set up the python 3, just ignore it).
print is a python language, which means print, or show the things that written in the files in the terminal. eg:print ("hello") on sublime to print the string hello.
Press Command+s to save the file as "ex1.py" on desktop.
python ex1.py on terminal to execute the file.
desktop xuyucan$ python ex1.py
helloIf the output is "2.x", you will need to try python3. For example, when you execute Python script, you need to type python3 myscript.py when our book uses python myscript.py.
An interpreter is a program that reads and executes code. This includes source code, pre-compiled code, and scripts. Basically, the Python interpreter is the application that runs your python script.
By default, Python source files are treated as encoded in UTF-8. But the standard library only uses ASCII characters for identifiers, a convention that any portable code should follow. To display all these characters properly, python interpreter will recognize that the file is UTF-8, and support all the characters in the file.
What the interpreter does in a nutshell:
- Read the script line by line and converts that script into python byte code.
- The interpreter then executes the file instruction by instruction, it is at this stage errors are created if your code generates such errors.
Typing the command python or python3 on your terminal.
After that, you will see >>> notation which indicates you that you have already entered the interactive mode and the interpreter is waiting for your input. For instance:
$ python3
>>> hello
hello
>>> 1 + 2
3
>>> a = 0Type control + d, or use quit() function to the interpreter.
-
The script modeis the normal mode where the scripted and finished.pyfiles are run in the Python interpreter. -
The interactive modeis a command line shell which gives immediate feedback for each statement.
Differences between two modes:
- A
.pyfile can only be executed in script mode, usingpython3+filename.pyto run the file. - In interactive mode, you can only enter one line and execute one line each time, while in script mode, you can execute all the code in the file at once by running the .py file directly.
- The interactive mode is primarily used to debug the code and testing.
Sometimes, you have an existing script, maybe from past works or from others. You want to execute this script first but stays in the Python interpreter after that. In this way, the state of the interpreter, e.g. all the variables, will be fully preserved for your further exploration. One can use the -i option. The command line pattern is as follows:
python -i myscript.pyJupyter notebook is originally called "IPython notebook" (interactive Python notebook), thus having the .ipynb suffix/ extended name of the the Jupyter notebook file.
It provides a web-based interface for you to interactively test and build Python codes. It is well suited for a bottom-up approach when building larger projects.
You will hear the term "environment" a lot of times when learning programming. It is a very broad term that refers to the context where the program is executed. The context can be time, operating system, current working folder, Python version, dependent module version, the status of system, the status of dependent components, ...
TIP: Two pieces of codes can act differently if the environments are different. When you find someone else's codes work but the same thing does not work at your side, it is a problem of "environment". Trouble shooting highly depends on experience and we will see a lot during the semester.
Python has a concept called virtual environment, "virtualenv" for short. You can use virtualenv to ensure the programs execute in the same environment. One common use case is to run Python2 and Python3 programs on the same computer. The system defaults to one of the major versions. However, you can use virtualenv to run some programs in Python2 and some programs in Python3. We also use virtualenv to ensure the dependent Python moduels are the same, whose version is usually specified in requirements.txt.
There are two commands to setup virtualenv:
virtualenv-- old executable usually used in Python2.pyvenv-- the default and recommended way of setting up virtualenv in Python3. The tools is shipped with Python3 installation.
If it's the first time you use jupyter notebook, you need create a virtual environment first. The following are the usual path to setup jupyter environment. For users in CVA 517 LAB, please see here.
Step 1: Create virtual environment
pyvenv venvStep 2: Enter virtual environment
source venv/bin/activateStep 3: Install Jupyter notebook
pip3 install jupyterStep 4: Enter Jupyter notebook
jupyter notebookFor details, Please see to our tutorial of how to install and enter jupyter notebook. The following is what jupyter notebook will look like.
- click
newto create a new python 3 notebook - write codes like you usually do in text editors, and press
shift+returnto run the code. It will return the results or errors under the cell. - use
! pip3 install module_nameto install modules in jupyter notebook. - in front of every cell, there is an
in [ ]sign, the number in[]means the sequences of cells, and if there is*in[], means that this cell is still running, you can either wait it finish or clickstopunder thekernelto exit from the running,pressingItwice will also do the trick. - cell.
run cellrun step by step.run all aboveto run and check the previous steps of coding. - kernel.
kernelis a tool for interactive input and output all the things you did from the beginning. By clickingrestart, you can give a variable another value.
- Write a Python script to output "Good evening" in the Terminal.
- Use shell commands to re-organise the notes you take from this course. We understand that you can do this very quickly in GUI (e.g. in Finder of OS X). Please try the command line way and make it part of your daily life.
- Terminal and shell commands (Chinese)
- Appendix A of "Learn Python the hard way" - Suggest all students make some self-study of this tutorial. Being comfortable in shell environment can make one efficient in programming.
If you have any questions, or seek for help troubleshooting, please create an issue here


