
The goal of the project is to build a prediction model that will determine whether a person uses drugs (such as cocaine, crack, or marijuana)
The dataset used for training was the National Survey of Drug Use and Health (2015-2019)
LGBMRegressor: Uses gradient boosting based on LightGBMXGBRegressor: Uses gradient boosting based on XGBoostRidge Regressor: Linear regression with regularization using RidgeGradient Boosting Regressor: Uses gradient boosting to improve model accuracyRandom Forest Regressor: Uses an ensemble of decision trees for predictionStackingCVRegressor: A model that combines the predictions of several underlying models using cross-validation
data/: The folder with the preprocessed datasetsdoc/: Іnformation about the dataset is taken from SAMHSA sitedemo/: Images with the results of model testingmodels/: Saved models and pipelinenotebooks/: A Jupyter notebook for data visualization, models training, and analysis of resultsrequirements.txt: List of required Python packages for installation
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| LightGBM | 0.912226 | 0.992937 | 0.830966 | 0.904760 |
| XGBoost | 0.909194 | 0.999785 | 0.819188 | 0.900521 |
| Ridge Regression | 0.909039 | 0.999677 | 0.818968 | 0.900344 |
| Gradient Boosting | 0.912580 | 0.992683 | 0.831892 | 0.905203 |
| Random Forest | 0.908441 | 1.000000 | 0.817512 | 0.899595 |
| Stacking | 0.913221 | 0.989709 | 0.835730 | 0.906225 |
git clone https://github.com/TokenRR/Bigdata_university_course.gitcd Bigdata_university_coursepip install -r requirements.txtYou can use the notebooks from the notebooks/ folder to research and analyze the results.
If you would like to contribute to this project, please create a pull request or open a new issue

