Intern: Nathaniel
Role: Data Analysis Intern
Duration: 11/03/2026 – 11/04/2026
Mode: Remote
Organization: Codveda Technologies
Codveda-Data-Analytics-Internship/
│
├── Level1_EDA/ → Exploratory Data Analysis (Iris Dataset)
├── Level1_DataCleaning/ → Data Cleaning & Preprocessing (Housing Dataset)
├── Level2_TimeSeries/ → Time Series Analysis (S&P 500 Stock Prices)
├── Level2_KMeans/ → K-Means Clustering (Iris Dataset)
├── Level3_Classification/ → Predictive Classification (Telecom Churn)
└── Level3_NLP_Sentiment/ → NLP Sentiment Analysis (Social Media Posts)
| Level | Task | Dataset | Status |
|---|---|---|---|
| Level 1 | Exploratory Data Analysis (EDA) | Iris Flower Dataset | ✅ Complete |
| Level 1 | Data Cleaning & Preprocessing | Boston Housing Dataset | ✅ Complete |
| Level 2 | Time Series Analysis | S&P 500 Stock Prices (2014–2017) | ✅ Complete |
| Level 2 | K-Means Clustering | Iris Flower Dataset | ✅ Complete |
| Level 3 | Predictive Classification | Telecom Customer Churn | ✅ Complete |
| Level 3 | NLP Sentiment Analysis | Social Media Posts | ✅ Complete |
| Tool | Purpose |
|---|---|
| Python 3.x | Core programming language |
| pandas | Data loading, cleaning, manipulation |
| NumPy | Numerical computations |
| matplotlib | Data visualization |
| seaborn | Statistical visualizations |
| scikit-learn | Machine learning models, preprocessing |
| re / collections | NLP text processing |
- Clone the repository:
git clone https://github.com/YOUR_USERNAME/Codveda-Data-Analytics-Internship.git
cd Codveda-Data-Analytics-Internship- Install dependencies:
pip install pandas numpy matplotlib seaborn scikit-learn- Navigate to any task folder and run the script:
cd Level1_EDA
python level1_eda_iris.py
⚠️ Make sure the relevant dataset CSV file is in the same folder as the script before running.
- 6 complete data analytics tasks across 3 difficulty levels
- 37 publication-quality visualisation plots generated
- End-to-end ML pipeline: preprocessing → modelling → evaluation → tuning
- NLP pipeline built from scratch: tokenisation, stopword removal, stemming, TF-IDF
- Random Forest classifier achieved 95% accuracy on churn prediction
- K-Means correctly identified all 3 Iris species with silhouette score of 0.46
- NFLX delivered the highest return (+270%) among tracked stocks (2014–2017)
- 🌐 Codveda Technologies
- 📧 support@codveda.com
- 💼 LinkedIn: @codveda