📊 Codveda Technologies — Data Analytics Internship

Intern: Nathaniel
Role: Data Analysis Intern
Duration: 11/03/2026 – 11/04/2026
Mode: Remote
Organization: Codveda Technologies

🗂️ Repository Structure

Codveda-Data-Analytics-Internship/
│
├── Level1_EDA/                  → Exploratory Data Analysis (Iris Dataset)
├── Level1_DataCleaning/         → Data Cleaning & Preprocessing (Housing Dataset)
├── Level2_TimeSeries/           → Time Series Analysis (S&P 500 Stock Prices)
├── Level2_KMeans/               → K-Means Clustering (Iris Dataset)
├── Level3_Classification/       → Predictive Classification (Telecom Churn)
└── Level3_NLP_Sentiment/        → NLP Sentiment Analysis (Social Media Posts)

✅ Tasks Completed

Level	Task	Dataset	Status
Level 1	Exploratory Data Analysis (EDA)	Iris Flower Dataset	✅ Complete
Level 1	Data Cleaning & Preprocessing	Boston Housing Dataset	✅ Complete
Level 2	Time Series Analysis	S&P 500 Stock Prices (2014–2017)	✅ Complete
Level 2	K-Means Clustering	Iris Flower Dataset	✅ Complete
Level 3	Predictive Classification	Telecom Customer Churn	✅ Complete
Level 3	NLP Sentiment Analysis	Social Media Posts	✅ Complete

🛠️ Tools & Technologies

Tool	Purpose
Python 3.x	Core programming language
pandas	Data loading, cleaning, manipulation
NumPy	Numerical computations
matplotlib	Data visualization
seaborn	Statistical visualizations
scikit-learn	Machine learning models, preprocessing
re / collections	NLP text processing

⚙️ How to Run

Clone the repository:

git clone https://github.com/YOUR_USERNAME/Codveda-Data-Analytics-Internship.git
cd Codveda-Data-Analytics-Internship

Install dependencies:

pip install pandas numpy matplotlib seaborn scikit-learn

Navigate to any task folder and run the script:

cd Level1_EDA
python level1_eda_iris.py

⚠️ Make sure the relevant dataset CSV file is in the same folder as the script before running.

📌 Key Highlights

6 complete data analytics tasks across 3 difficulty levels
37 publication-quality visualisation plots generated
End-to-end ML pipeline: preprocessing → modelling → evaluation → tuning
NLP pipeline built from scratch: tokenisation, stopword removal, stemming, TF-IDF
Random Forest classifier achieved 95% accuracy on churn prediction
K-Means correctly identified all 3 Iris species with silhouette score of 0.46
NFLX delivered the highest return (+270%) among tracked stocks (2014–2017)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 Codveda Technologies — Data Analytics Internship

🗂️ Repository Structure

✅ Tasks Completed

🛠️ Tools & Technologies

⚙️ How to Run

📌 Key Highlights

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Level1_DataCleaning		Level1_DataCleaning
Level1_EDA		Level1_EDA
Level2_KMeans		Level2_KMeans
Level2_TimeSeries		Level2_TimeSeries
Level3_Classification		Level3_Classification
Level3_NLP_Sentiment		Level3_NLP_Sentiment
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

📊 Codveda Technologies — Data Analytics Internship

🗂️ Repository Structure

✅ Tasks Completed

🛠️ Tools & Technologies

⚙️ How to Run

📌 Key Highlights

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages