This project is a machine learning-based web application built with Streamlit for predicting house prices. It uses Linear Regression and optionally XGBoost, trained on housing data with preprocessing steps included in a pipeline.
House\_Price\_Prediction/
│
├── dataset/
│ ├── train.csv
│ ├── test.csv
│ └── sample\_submission.csv
│
├── oneshot\_linear.py # Train linear regression model
│
├── dataset/
│ ├── train.csv
│ ├── test.csv
│ └── sample\_submission.csv
│
├── train\_xgboost.py # Train XGBoost model (optional)
├── app.py # Streamlit frontend app
├── zero\_shot\_model.pkl # Trained model file (e.g., Linear or XGBoost)
│
├── dataset/
│ ├── train.csv
│ ├── test.csv
│ └── sample\_submission.csv
│
├── submission\_oneshot\_linear.csv # Generated submission file
├── requirements.txt
└── README.md
- Upload custom house data and get predicted price
- Interactive sidebar for input features
- Visualizations:
- Histogram of Sale Prices
- Correlation Heatmap
- Feature Importance (for XGBoost models)
- Preprocessing includes:
- Missing value handling
- One-hot encoding
- Model serialization using
joblib
Install dependencies using:
pip install -r requirements.txtrequirements.txt
pandas
scikit-learn
joblib
xgboost
streamlit
matplotlib
seabornpython oneshot_linear.pyThis will:
- Train a model on
train.csv - Save predictions to
submission_oneshot_linear.csv - Export trained model as
oneshot_linear_model.pkl
python train_xgboost.pyExports model as xgboost_model.pkl
To launch the Streamlit frontend:
streamlit run app.pyMake sure your trained model (e.g., zero_shot_model.pkl) is present in the directory.
- Numerical Inputs: LotArea, GrLivArea, YearBuilt, etc.
- Categorical Inputs: Neighborhood, HouseStyle, etc.
- Linear Regression gives a good baseline performance.
- XGBoost supports feature importance plotting.
- Trained pipeline includes full preprocessing using
ColumnTransformer.
This project is licensed under the MIT License.
Jayasimma D
Feel free to connect on LinkedIn or contribute to this project!








