-
Update requirements.txt: Add new dependencies for advanced features (plotly for interactive charts, openpyxl for Excel support, scikit-learn for preprocessing like outlier removal and scaling).
-
Enhance app.py:
- Add support for multiple file formats (CSV and Excel).
- Implement advanced preprocessing options: outlier removal (IQR), normalization (MinMaxScaler), categorical encoding (one-hot), duplicate removal.
- Generate summary statistics tables for original and cleaned data.
- Switch chart generation to Plotly for interactive visualizations (comparison subplots).
- Add file validation (size limit 10MB, check if CSV/Excel).
- Save both original and cleaned datasets for download.
- Improve error handling.
-
Update index.html: Add form elements for advanced preprocessing options (checkboxes for outliers, normalize, encode, duplicates). Update file input to accept .csv and .xlsx.
-
Update result.html: Add sections for summary statistics (HTML tables), download links for original and cleaned files, embed Plotly interactive charts (using plotly.js CDN).
-
Update styles.css: Add styles for new UI elements (preprocessing options, summary tables, interactive chart containers).
-
Update README.md: Expand with setup instructions, new features list, usage guide.
-
Install dependencies: Run pip install -r requirements.txt.
-
Test the application:
- Run Flask app.
- Upload sample CSV/Excel files.
- Select preprocessing options and charts.
- Verify outputs: cleaned data, summaries, interactive charts, downloads.
- Use browser to interact and confirm.
-
Handle any issues: Debug errors from testing, update files as needed.