EduPulse is a student performance analytics project built with React, Flask, Spark, and a lightweight ML pipeline. It shows subject-wise performance, attendance, and at-risk students across CSE, DSAI, and ECE.
The current version focuses on:
- subject-wise total marks
- attendance analysis
- at-risk identification
- branch-level summaries
- branch-wise risk comparison charts
- Subject dropdown for
BDA,DL,DSP, andDBMS - Per-student subject totals on a normalized
100-point scale - At-risk detection based on marks and attendance
- Overview cards for total students, at-risk count, marks, and attendance
- Branch-wise performance summary
- Search, branch filter, sorting, and pagination for student records
- Search in the at-risk view by student name or ID
- Risk-type filters for
Marks < 33%andAttendance < 75% - Subject-aware analytics charts built with Chart.js
All datasets in this project are stored in the Hadoop Distributed File System (HDFS), which is configured and run locally on the system.
Instead of relying on local file storage, the project uses HDFS to simulate a distributed data environment, enabling scalable data processing using Apache Spark.
/students_data/
├── students/
│ ├── cse_students.csv
│ ├── dsai_students.csv
│ └── ece_students.csv
│
├── attendance/
│ ├── cse_attendance.csv
│ ├── dsai_attendance.csv
│ └── ece_attendance.csv
│
├── marks/
│ ├── cse_BDA_marks.csv
│ ├── cse_DL_marks.csv
│ ├── cse_DSP_marks.csv
│ ├── cse_DBMS_marks.csv
│ ├── dsai_BDA_marks.csv
│ ├── ...
│ └── ece_DBMS_marks.csv
│
└── processed_data/
└── final_dataset.csv
## Marks Normalization
Each subject is normalized to `100` marks using this formula:
total_subject_marks =
quiz1_marks
+ quiz2_marks
+ assignment_marks
+ (mid_sem_marks × 0.4)
+ (end_sem_marks × 0.4)
This means:
quiz1remains as-isquiz2remains as-isassignmentremains as-ismid semis scaled from50to20end semis scaled from100to40
So the final subject total is out of 100.
A student is marked as at-risk if either condition is true:
- subject total marks are below
33 - attendance is below
75%
In code terms:
selected_marks < 33 OR attendance_pct < 0.75
Because the dashboard is subject-aware, the same student may be safe in one subject and at-risk in another.
- Frontend: React + Vite + Chart.js
- Backend API: Flask + Pandas
- Data processing: PySpark
- ML: Logistic Regression with Spark MLlib
BDA-Project-4th-Sem/
├── backend/
│ ├── flask/
│ │ └── app.py
│ ├── ml/
│ │ └── train_model.py
│ ├── raw_data/
│ │ ├── *_students.csv
│ │ ├── *_attendance.csv
│ │ └── *_marks.csv
│ └── spark/
│ └── pipeline.py
├── frontend/
│ ├── src/
│ │ ├── components/
│ │ ├── constants/
│ │ └── hooks/
│ ├── .env
│ └── package.json
└── README.md
backend/spark/pipeline.py reads raw student, attendance, and marks CSV files, computes:
- attendance percentage
- normalized subject totals
- average marks across subjects
- binary at-risk label for the ML dataset
The processed dataset is written to:
hdfs://localhost:9000/students_data/processed_data/final_dataset.csv
backend/ml/train_model.py trains a Logistic Regression model using:
avg_marksattendance_pct
It outputs prediction data including:
predictionrisk_score
backend/flask/app.py serves the dashboard data. It combines the processed predictions CSV with raw subject-mark files so the frontend can query subject-specific totals.
Available endpoints:
GET /healthGET /data?subject=BDAGET /at-risk?subject=BDAGET /summary?subject=BDAGET /branch/<branch>
The default subject is BDA if no subject is provided.
The frontend is a React dashboard with these tabs:
OverviewAll StudentsAt-RiskAnalytics
Main UI behavior:
- a global subject dropdown controls subject-specific marks
- all views update when the subject changes
- marks bars and charts use the normalized
100-point scale - the Analytics tab shows
Safe vs At-Risk by Branch - the Analytics tab can be filtered by
Marks < 33%orAttendance < 75% - the At-Risk tab can be filtered by risk reason and searched by student name or ID
backend/.env
PROCESSED_CSV=data/predictions.csv
PORT=5000
HDFS_NAMENODE='http://localhost:9000'
SPARK_MASTER='local[*]'From the project root:
cd backend/flask
python3 app.pyThe API runs by default on http://localhost:5000.
From the project root:
cd frontend
npm install
npm run devThe Vite app will run on its local dev port and call the Flask API using VITE_API_URL.
- Aalekh Raghuvanshi
- Bhavya Khare
- Devam Sharma
- Hemant Kumar
- Saksham Kushwah