Welcome to the Kalpana Accelerator Data Pipeline repository! This project is designed to facilitate the cleaning and transformation of data from various sources, such as graphy and form. The processed data is then stored in a MySQL database, and Excel files are generated for further analysis and reporting.
-
scripts/: Includes the Python script for processing and cleaning data.
-
output/: Contains the output Excel files generated by the tool.
-
data_files/: Holds the raw data from various sources, serving as input for the data cleaning and remapping process.
The Kalpana Incubator Data Remapping Tool follows these key steps:
-
Data Ingestion: Raw data from various sources is placed in the "Data Files" folder.
-
Data Cleaning: The code in the "scripts" folder processes and cleans the raw data to ensure it is ready for analysis and storage.
-
Data Storage: The cleaned data is stored in a MySQL database for further use.
-
Output Generation: The tool generates Excel files containing the cleaned data and places them in the "Output" folder.
To start using the Data Pipeline, follow these steps:
-
Place your raw data files in the "Data Files" folder.
-
Execute the code in the "scripts" folder to process and clean the data.
-
Retrieve the cleaned data from the MySQL database for analysis.
-
Find the generated Excel files in the "Output" folder for reporting and sharing.
Make sure you have the following prerequisites installed:
- Python (version 3.6 or higher).
- Jupyter Notebook
- Dependencies: Ensure you have the necessary dependencies and database connection information configured in the code before running it.