MSSQL to AWS Redshift data transfer tool written in Python 3.7
- Introduction
- How this tool works
- How to install and run the tool
- How to setup a new MSSQL to Redshift data migration project
This tool is able to migrate data from your MSSQL Database to AWS Redshift. It consumes arguments defining:
--databasenameor-dn( the name of the database with the tables in MSSQL you wish to migrate over, this argument needs to be aligned with the values in the column DatabaseName inside the configuration table MSSQL_to_Redshift.mngmt.ControlTable )--schemanameor-sn( the name of the database schema with the tables in MSSQL you wish to migrate over, this argument needs to be aligned with the values in the column SchemaName inside the configuration table MSSQL_to_Redshift.mngmt.ControlTable )--generated_csv_files_target_directoryor-td( the local folder where you wish to store your .csv files, if the folder not exists, it will be created for you during the runtime)--dryrunor-dr(True|Falseallowed ,this argument let's you run a dry run for testing purposes, filtering SQL Server data to 0 rows)
To install:
git clone https://www.github.com/datahappy1/mssql_to_redshift_data_transfer_tool mssql_to_redshift_data_transfer_toolcd c:\mssql_to_redshift_data_transfer_tool- create and activate a virtual environment
- download Windows Postgres Driver from https://www.enterprisedb.com/downloads/postgres-postgresql-downloads
pip3 install -r requirements.txt- run the database build script using
mssql_database_buildscript.sqllocated here
Setup these environment variables:
odbc_mssql_dsn_name- mandatory MSSQL DSN name for PyODBCodbc_mssql_uid- optional MSSQL DSN user name, not needed if you set the user in Windows ODBC connectionodbc_mssql_pwd- optional MSSQL DSN password, not needed if you set the password in Windows ODBC connectionaws_access_key_id-mandatory AWS Access key IDaws_secret_access_key-mandatory AWS Secret access keyredshift_host-mandatory AWS Redshift host name ( for example redshift-cluster-1.xxx.us-east-1.redshift.amazonaws.com )redshift_port-mandatory AWS Redshift port ( integer )redshift_user-mandatory AWS Redshift user nameredshift_pass-mandatory AWS Redshift password
You need to set the PYTHONPATH like this:
set PYTHONPATH=%PYTHONPATH%;C:\mssql_to_redshift_data_transfer_tool\
A permanent solution is to:
- Go to the Windows menu, right-click on “Computer” and select “Properties”
- From the computer properties dialog, select “Advanced system settings” on the left
- From the advanced system settings dialog, choose the “Environment variables” button
- In the Environment variables dialog, click the “New” button in the top half of the dialog, to make a new user variable
- Give the variable name as PYTHONPATH and the value is the path to the code directory. Choose OK and OK again to save this variable
- Now open a cmd Window (Windows key, then type cmd and press Return). Type:
echo %PYTHONPATH%to confirm the environment variable is correctly set
https://bic-berkeley.github.io/psych-214-fall-2016/using_pythonpath.html
You need to set the PYTHONPATH like this:
export PYTHONPATH=$PYTHONPATH:/home/your_user/your_git_projects/mssql_to_redshift_data_transfer_tool/
A permanent solution is to:
- Open your favorite terminal program
- Open the file ~/.bashrc in your text editor – e.g. atom ~/.bashrc
- Add the following line to the end:
export PYTHONPATH=/home/my_user/code - Save the file
- Close your terminal application
- Start your terminal application again, to read in the new settings, and type this:
echo $PYTHONPATH, you should see something like /home/my_user/code
You need to set the required arguments :
--databasenameor-dn--schemanameor-sn--generated_csv_files_target_directoryortd--dryrunor-dr
Run these commands to execute:
cd mssql_to_redshift_data_transfer_toolpython mssql_to_redshift_data_transfer_tool.py -dn AdventureWorksDW2016 -sn dbo -td C:\mssql_to_redshift_data_transfer_tool\files -dr False
- Setup the database tables with their columns that you need to transfer over to AWS Redshift in the MSSQL Configuration table
MSSQL_to_Redshift.mngmt.ControlTable
Note that this tool's internal database MSSQL_to_Redshift has to be installed at the same host where your source MSSQL databases are located. Another option is to use Linked Servers
- Don't forget to setup the project-scoped settings like the AWS S3 bucket name, the maximum csv filesize in MB, database names, logging level and others using settings.py
Allowed logging level values in settings.py are
'DEBUG','INFO','WARNING','ERROR','CRITICAL'
- Make sure you've got your AWS Redshift tables ready
Redshift tables need to be named exactly like the tables configured in
MSSQL_to_Redshift.mngmt.ControlTableMSSQL table
- Set the Pythonpath env.variable
- Try running this tool with the
--Dryrunargument first set totrue - Now you can go and configure the databases, schemas and table names that will be transferred over to AWS Redshift in the
MSSQL_to_Redshift.mngmt.ControlTableMSSQL table
