An intelligent, automated Extract, Transform, Load (ETL) pipeline that uses a Large Language Model (Google Gemini) to dynamically sanitize, standardize, and format messy CSV data into pure, algorithm-ready JSON.
This project utilizes n8n as the orchestration engine to handle batch processing and API routing, combined with a custom vanilla JavaScript frontend.
- Frontend: A lightweight HTML/Tailwind CSS interface that sends raw CSV files via a
FormDataPOST request. - Backend Pipeline: n8n webhook listener that processes incoming files.
- Transformation Engine: A
Looparchitecture that feeds data row-by-row into the Google Gemini LLM. - Strict Output Formatting: Uses dynamic JSON Schemas to force the AI to return strict data types (e.g., converting mixed date strings to ISO 8601, inferring booleans, and handling null values).
- Orchestration: n8n
- AI/NLP: Google Gemini 2.5 Flash
- Frontend: HTML5, Vanilla JavaScript, Tailwind CSS
- Data Formats: CSV (Input), JSON (Output)
- Import the
etl_pipeline_workflow.jsonfile into your local n8n instance. - Add your Google Gemini API credentials to the AI node.
- Activate the workflow.
- Open
index.htmlin any browser, uploadgrades_test.csv, and watch the console output the sanitized JSON array.