Small Python scripts to generate sample pipeline output data and validate daily load quality checks.
csv_data.py: Creates a samplepipeline_output.csvfile for testing.pipeline_validator.py: Validates today's records for duplicates, nulls, and status counts, then writes a report.
- Python 3.8+
pandas
Install dependency:
pip install pandas- Generate sample CSV:
python csv_data.py- Run validator:
python pipeline_validator.py- Check generated report file:
pipeline_report_<YYYY-MM-DD>.txt
- Records loaded for today
- Duplicate
customer_idvalues - Null values per column
- Status distribution (for example:
SUCCESS,FAILED)
Built based on real production operations experience supporting AWS and GCP data pipelines at Accenture for global enterprise clients. The validation logic mirrors day-to-day data quality checks performed during live pipeline monitoring for Ingredion (GCP/Airflow) and Essilor (AWS batch pipeline).