Skip to content

Cyber-Nate/Codveda-Data-Analytics-Internship

Repository files navigation

📊 Codveda Technologies — Data Analytics Internship

Intern: Nathaniel
Role: Data Analysis Intern
Duration: 11/03/2026 – 11/04/2026
Mode: Remote
Organization: Codveda Technologies


🗂️ Repository Structure

Codveda-Data-Analytics-Internship/
│
├── Level1_EDA/                  → Exploratory Data Analysis (Iris Dataset)
├── Level1_DataCleaning/         → Data Cleaning & Preprocessing (Housing Dataset)
├── Level2_TimeSeries/           → Time Series Analysis (S&P 500 Stock Prices)
├── Level2_KMeans/               → K-Means Clustering (Iris Dataset)
├── Level3_Classification/       → Predictive Classification (Telecom Churn)
└── Level3_NLP_Sentiment/        → NLP Sentiment Analysis (Social Media Posts)

✅ Tasks Completed

Level Task Dataset Status
Level 1 Exploratory Data Analysis (EDA) Iris Flower Dataset ✅ Complete
Level 1 Data Cleaning & Preprocessing Boston Housing Dataset ✅ Complete
Level 2 Time Series Analysis S&P 500 Stock Prices (2014–2017) ✅ Complete
Level 2 K-Means Clustering Iris Flower Dataset ✅ Complete
Level 3 Predictive Classification Telecom Customer Churn ✅ Complete
Level 3 NLP Sentiment Analysis Social Media Posts ✅ Complete

🛠️ Tools & Technologies

Tool Purpose
Python 3.x Core programming language
pandas Data loading, cleaning, manipulation
NumPy Numerical computations
matplotlib Data visualization
seaborn Statistical visualizations
scikit-learn Machine learning models, preprocessing
re / collections NLP text processing

⚙️ How to Run

  1. Clone the repository:
git clone https://github.com/YOUR_USERNAME/Codveda-Data-Analytics-Internship.git
cd Codveda-Data-Analytics-Internship
  1. Install dependencies:
pip install pandas numpy matplotlib seaborn scikit-learn
  1. Navigate to any task folder and run the script:
cd Level1_EDA
python level1_eda_iris.py

⚠️ Make sure the relevant dataset CSV file is in the same folder as the script before running.


📌 Key Highlights

  • 6 complete data analytics tasks across 3 difficulty levels
  • 37 publication-quality visualisation plots generated
  • End-to-end ML pipeline: preprocessing → modelling → evaluation → tuning
  • NLP pipeline built from scratch: tokenisation, stopword removal, stemming, TF-IDF
  • Random Forest classifier achieved 95% accuracy on churn prediction
  • K-Means correctly identified all 3 Iris species with silhouette score of 0.46
  • NFLX delivered the highest return (+270%) among tracked stocks (2014–2017)

📬 Contact

About

6 data analytics projects covering EDA, cleaning, time series, clustering, classification and NLP — Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages