General

Various tools and programs used for data analysis of power usage statistics. The overall suite of tools is very much underdeveloped at the moment of writing. All tools/programs make use of a common database format and schema.

It should be noted, unless explicitly stated otherwise, all reference to Python are specifically for python3, and SQLite/database to sqlite3.

Database Schema

The database all programs/tools use is assumed to contain the source data in a table named usage. The table stores the usage values using UNIX timestamps as opposed to human-readable timestamps in order to simplify the data processing. The usage table is assumed to be adhear to the following structure:

CREATE TABLE usages (
  time_stamp INTEGER,
  house_id INTEGER,
  usage NUMERIC
);

In order to initialize an empty usage table (or clear an existing one) the init.sql file located in the sql directory can be used. Use of init.sql can be done trivially as:

sudo sqlite3 database.db < init.sql

Manual

Building an SQL File

The .sql file used to populate the original usages database is generated from raw data stored as a .csv. The program reads the data in the CSV, from start until end (or a specified end-date), and creates the needed INSERT statements required for table population. The CSV is assumed to be of format:

Time,Apartment1,Apartment2,Apartment3,Apartment4,Apartment5

The program is called as:

python3 build-sql.py [CSV] [SQL]

In which CSV is the relative path/name of the raw CSV file, and SQL is the relative path/name of the generated SQL file (program output).

CLI Parameters	Default	Description
`[CSV]`	n/a	Relative path/name of the input CSV
`[SQL]`	n/a	Relative path/name of the output SQL file
`-d`	To the end of file	Exclusive end date for the data to be read from the CSV

Linear Regression Model

This program reads a set of usage values and attempts to find the "line of best fit" for it. It outputs a graphical result of the line-of-best-fit superimposed on the dataset as well as the calculated:

Slope
Y-Intercept
Coefficient of correlation (r)
Coefficient of determination (r^2)
Two-tailed p value for null hypothesis m=0
Error estimator
Linear equation y = mx + b

The program is called as:

sudo python3 [S] [E] [DB]

In which [S] is the start date of the data and [E] is the end date of the data. Both dates are required to be formated as YYYY-MM-DD. The start date is inclusive, while the end date is exclusive. For example, a range of 2015-01-01 to 2015-01-04 would read data with timestamps on a date {2015-01-01, 2015-01-02, 2015-01-03}.

The [DB] flag is the relative path to the SQLite3 database, in which the data to be processed is stored.

An example of the program being run over selective data from the first week of February, 2015 is given as follows:

sudo python3 2015-02-01 2015-02-08 dat/sample-data.db

CLI Parameters	Default	Description
`[S]`	n/a	Inclusive start date of the data to be processed
`[E]`	n/a	Exclusive end date of the data to be processed
`[DB]`	n/a	Full relative path/name to the SQLite3 database file containing the data. Also the database where the processed data will be stored
`-v`	Off	Toggles verbose on
`-a`	Off	Enable audible beeping upon program completion
`-f [F]`	Current Directory	Set the path (not name) for figure dump

K-Mean Clusters Pre-Processing

This program reads a set of N values, and outputs a set of K values, in which K < N. The program then graphs and stores the result. This script is computationally expensive, and for most configurations, will take a long duration of time to run. The program is called as:

sudo python3 [S] [E] [DB] [T]

In which [S] is the start date of the data and [E] is the end date of the data. Both dates are required to be formated as YYYY-MM-DD. The start date is inclusive, while the end date is exclusive. For example, a range of 2015-01-01 to 2015-01-04 would read data with timestamps on a date {2015-01-01, 2015-01-02, 2015-01-03}.

The [DB] flag is the relative path to the SQLite3 database, in which the data to be processed is stored.

The [T] flag is the name of the table the resulting data set will stored in, using the aforementioned schema. It should be noted that if a table of name [T] already exists, it will be dropped without warning. [T] also denotes the filename the resulting figure will be saved as.

CLI Parameters	Default	Description
`[S]`	n/a	Inclusive start date of the data to be processed
`[E]`	n/a	Exclusive end date of the data to be processed
`[DB]`	n/a	Full relative path/name to the SQLite3 database file containing the data. Also the database where the processed data will be stored
`[T]`	n/a	Name of the table to store processed data in. Also the filename of the resulting figure
`-v`	Off	Toggles verbose on
`-vv`	Off	Toggles maximum verbosity
`-p`	On	Disable using multiple cores for computation
`-a`	Off	Enable audible beeping upon program completion
`-k [K]`	10	Set k value
`-l [L]`	1	Number times the algorithm is fully run. Higher values tend to lead to better results
`-i [I]`	500	Max number of iterations per algorithm executions. Higher values tend to lead to better results
`-f [F]`	Current Directory	Set the path (not name) for figure dump

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
dat		dat
modeling		modeling
pre		pre
sql		sql
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

General

Database Schema

Manual

Building an SQL File

Linear Regression Model

K-Mean Clusters Pre-Processing

About

Uh oh!

Releases

Packages

Languages

carleton-smart-grid/analytics

Folders and files

Latest commit

History

Repository files navigation

General

Database Schema

Manual

Building an SQL File

Linear Regression Model

K-Mean Clusters Pre-Processing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages