Bitsat Predictions

Preface

Personally I don't know where to start, but if I were to pinpoint a starting point to this project it was at the time me and PixelHalide were discussing how to predict difficulty of MET-exam for his site.

While formulating and failures in the way, I decided to experiment with another entrance examination which is notorious for varying-difficulty each year and had enough datapoints for me to work on and experiment with and that exam was as you guessed it: BITSAT

(also partially cause incase this experimentation grew more than it was intended to then lets say i knew r/bitsatards mods pretty well who would help me further help and promote my cause)

So today, the experiment clearly grew a lot since it was started and is currently deployed on the intended server and there is a website that has been made for the same to increase outreach so maybe now is the time to pivot to how I formulated this and how exactly my bot works.

Data Collection

Linear Scaling: Pre-2022 BITSAT exams were out of 450 marks, while current exams are out of 390. To maintain trend visibility, I applied linear scaling to all pre-2022 data (score * 390 / 450). Without this, the data would falsely imply a massive drop in cutoff trends.

Omission of New Branches: BITS introduced several new branches between 2024–2025. With only 1-2 data points available, it is impossible for a model to make reasonable generalizations. Since different branches follow wildly different historical trends (e.g., B.Pharma vs. CSE which can be observed below), artificially augmenting this data proved illogical. These branches are currently omitted.

Extended Dataset: The dataset was manually extended back to 2013 and cross-verified across multiple sources to maximize training points.

Data Drift

In predictive modeling, older data can sometimes degrade performance rather than help it due to changing real-world contexts.

Example: Pre-2017 B.Pharma cutoffs were significantly higher, reflecting a different timeline of interest. Using a 200+ score for today's worst-case predictions would be inaccurate.

Target Function Changes: Similar to how MET introduced a "Boards Band-Gap" in 2024 (changing how rank is calculated), older BITSAT data must be viewed through the lens of shifting admission trends.

There are thoughts of removing these redundant datapoints but that decision was never finalized.

Situational-Predictions

Because it is impossible to predict the exact difficulty of a future paper set by the admission council, the bot acts as a Statistical Validator that outputs predictions based on historical quantile estimation.

Min-Max Scaling: I applied min-max scaling to branch cutoffs, assigning the historically highest score a 1.0 (Coefficient of Difficulty) and the lowest a 0.0.

Model Selection: I utilized a degree-2 Polynomial Regression, which yielded the lowest RMSE on the test dataset.

The Three Situations: Based on standard deviations, predictions are bracketed into three scenarios:

Best-Case: 0.2 Most-Likely: 0.5 Worst-Case: 0.8

If you want the absolute worst and best-case scenarios just check out 2016 (worst) and 2021/2022 (best) cutoff scores

Finally as it would on website, this is not really a by-definition ML-model as the current feature-set is not enough to make definitive predictions for these purposes.

Which is why I consider this right now to be more of a STATISTICAL-VALIDATOR at the moment as described earlier.

Future Plans For Model

While I haven't started working on it yet, there were plans to include various features which majorly influence BITSAT scores.

Quite many were thanks to the recommendations of my cracked friend:

JEE Mains
Other important entrances such as: MET, UGEE
BITS Fees over the years
Desirability index
Other Misc. Factors (I have mostly grouped them right now cause I don't really know how to proceed with these)

(1) JEE Mains has the biggest role and maybe the biggest factor to what may affect bits scores each year, Since nearly all BITSAT candidates have written JEE Mains, factors like total registered candidates and the JEE Advanced qualifying percentile directly impact BITSAT competition majorly.

(2) Other entrances such as UGEE, MET, and top-tier NITs/IIITs draw students away from BITS, affecting the final seat matrix.

(3) A big factor many people look at BITS is its high-fees, for many years people would accept this high amount since it meant an elite-college and good curriculum and obviously high-return since BITS placements rival with those of top-IITs. BITS has seen considerable fee hikes. The ratio of perceived Return-on-Investment (ROI) against the rising fees heavily influences drop-out rates during counseling.

(4) Now inside bits itself in campuses, not each branch and campus are valued the same.

Student preference is heavily weighted. The general campus hierarchy (Pilani > Goa > Hyderabad) often clashes with branch hierarchy (Circuital > Mechanical). For example, in 2025, Mechanical at Pilani (235) was valued slightly higher than ENI at Goa (234).

(5) There are other factors such as: seat-matrix (EXTREMELY IMPORTANT)

and now one may wonder how does this help determining difficulty of paper itself and the answer is YOU CAN'T.

Paper-difficulty is something bits-admission council sets (and I don't work for them unfortunately) and even if you know the difficulty of the exam you can't effectively make a prediction from that. Lets take JEE Mains 2023 as instance (taking reference to stated point (1)) which had near to 10 lakh registered candidates and papers were considered generally moderate and the cutoff-scores were also reasonable, in sharp-contrast 2024 had a lot of candidates nearing to 12.5 lakh registered candidates and exam papers were roughly on same level as previous year's, in some cases even harder than last year but the cutoffs MASSIVELY INCREASED so maybe candidates registered is a main factor right? not-quite, in 2025 it was said that there were about 13 lakh registrations. Now I personally haven't written exam for 2025 but cutoff scores DROPPED MASSIVELY with respect to 2024. So currently a statistical-validator helps in this situation when not many factors are taken in account since it accounts historically how high and how low a branch's cutoff scores can realistically vary by.

Bits has one more thing in contrast to NITs and that is no-attendance policy (well they do have an attendance policy, a tiny bit but it is far less strict than any other college) which makes them stand out so much more compared to the likes of IIIT and NITs.

But these factors are so hard to take in account for, it can be entirely dropped or maybe put in review once model-interpretability is done

Limitations

Model's predictions are only as good as the data you feed it, now as I mentioned above, the knowledge of seat-matrix dating all the way back to 2013 would be a stepping-stone as it would help me understand how many candidates did make in BITS finally from each campus and in what branch.

BITS does not publicly release historical seat matrices, making it impossible to know exactly how many students were admitted per branch. Placement stats are a flawed proxy (ignoring higher-studies or off-campus placements).

Again just want to clarify no matter how many improvements in model is made, it can NEVER ACTUALLY PREDICT THE EXACT SCORE OF THE CUTOFF-SCORE

Special Thanks

And everyone else who helped me on this formulation and project, this wouldn't have been possible alone.

Author

PranavU-Coder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!