Udacity Data Engineering Nanodegree Capstone

This is the final project for the Udacity Data Engineering Nanodegree.

Repository

Capstone Project.ipynb - Workbook that led to creation of the more concise etl.py. Contains a more thorough writeup of all processes.

exploration.ipynb - Initial reads/exploration of the data.

create_tables.py - Deletes all tables and the database itself if it already exists, then creates the database/tables according to sql_queries.py. Runs count-based quality checks.

etl.py - Loads files in the directory, processes them, and sends their values into the database insert statements.

sql_queries.py - Specifies the creation and insertion commands for each table in the database.

How to use

There are 2 forms of this project offered: As a notebook and as shell scripts.

For the notebook, run each cell in Capstone Project.ipynb in order.

In the terminal, you may use python3 create_tables.py while in the directory, which will set up the skeleton for the database. The final required step is to run python3 etl.py to insert all data in the directory to their correct tables. An additional quality check looking at the expected and observed counts in the final tables is run at the end of this file, and will raise an error if expectations are violated. Running time is approximately 6 minutes.

Schema

Below, each table and its columns (types) are listed out. Primary keys are marked PK, while foreign keys are marked FK.

Explanations for each variable may be found in data_dictionary.tsv in this directory.

arrivals

arrival_id (serial) PK
country_id (int) FK
visa_type (int)
count (int)
year (int)
month (int)
port (varchar) FK

temp

temp_id (serial) PK
country_id (int) FK
year (int)
month (int)
avg_temp (float)
avg_tempF (float)

airports

port (varchar) PK
municipality (varchar)
country_id (int) FK
region (varchar)

countries

country_id (int) PK
country_name (varchar)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Capstone Project.ipynb		Capstone Project.ipynb
I94_SAS_Labels_Descriptions.SAS		I94_SAS_Labels_Descriptions.SAS
LICENSE		LICENSE
README.md		README.md
airport-codes_csv.csv		airport-codes_csv.csv
countries.csv		countries.csv
create_tables.py		create_tables.py
data_dictionary.tsv		data_dictionary.tsv
etl.py		etl.py
exploration.ipynb		exploration.ipynb
immigration_data_sample.csv		immigration_data_sample.csv
iso_2alpha.csv		iso_2alpha.csv
old_template.ipynb		old_template.ipynb
sql_queries.py		sql_queries.py
us-cities-demographics.csv		us-cities-demographics.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Udacity Data Engineering Nanodegree Capstone

Repository

How to use

Schema

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Udacity Data Engineering Nanodegree Capstone

Repository

How to use

Schema

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages