The NUFORC database contains decades of public UFO sighting reports submitted by witnesses across the United States and abroad. Most reports describe brief, mundane observations (lights in the sky, ambiguous shapes), while a small minority are highly dramatic narratives describing structured craft, occupants, sustained encounters, or other extraordinary content. This project builds models that score each report on a dramaticness scale and explains why a given report received the score it did.
The pipeline:
- Ingests scraped NUFORC report data.
- Engineers a combined set of structured and NLP-derived features from each report's free-text summary.
- Trains and tunes several model families: logistic regression, CatBoost on tabular features, CatBoost on text features, CatBoost combining both, and a zero-/few-shot LLM classification baseline.
- Evaluates models with stratified cross-validation, average-precision scoring, and bootstrap confidence intervals.
- Generates SHAP explanations for individual predictions.
- Serves predictions and explanations through a live dashboard.
The work extends RAND's 2023 report Not the X-Files, which analyzed geographic and temporal patterns in NUFORC reports, by adding a content-aware dimension grounded in the language of the reports themselves.
A live version of the dashboard is deployed at:
apps.datasciencedynamics.com/uap_classifier
The app is built on a Flask/Dash WSGI dispatcher (entry point: app.py) and lets users browse scored reports, inspect per-report SHAP explanations, and explore aggregate patterns in dramaticness across regions, shapes, and report years.
| Model key | Description |
|---|---|
lr |
Logistic regression on tabular features (baseline) |
cat |
CatBoost on tabular features |
cat_text_only |
CatBoost on free-text features only |
cat_feats_and_text |
CatBoost combining tabular and text features |
train_llm |
Zero-shot and few-shot LLM classification baseline |
Each tabular model can be run under six pipeline variants that combine class-imbalance handling (orig, smote, under) with optional recursive feature elimination (_rfe). All runs are tracked with MLflow.
dusc_nuforc/
├── app.py # Flask/Dash entry point for the dashboard
├── core/ # Shared config, constants, and utility functions
│ ├── config.py
│ ├── constants.py
│ └── functions.py
├── preprocessing/ # Data ingestion and feature engineering
│ ├── 1_data_gen.py
│ ├── 2_nlp_feature_engineer_nuforc.py
│ ├── 3_nuforc_analytics.py
│ ├── 4_preprocessing_remaining_feats.py
│ └── 5_feat_gen.py
├── modeling/ # Training, evaluation, explanation, inference
│ ├── train.py # LR + CatBoost training across pipeline variants
│ ├── train_llm.py # Zero-/few-shot LLM baseline
│ ├── evaluate.py
│ ├── bootstrap_evaluation.py
│ ├── save_predictions.py
│ ├── explainer.py # SHAP explainer fitting
│ └── explanations_training.py
├── notebooks/
│ ├── raw_data_exploration.ipynb
│ ├── data_exploration.ipynb
│ └── performance_assessment.ipynb
├── models/ # Trained models, predictions, evaluation artifacts
│ ├── eval/
│ ├── predictions/
│ ├── results/
│ └── train/
├── data/ # Raw, interim, processed datasets (gitignored)
├── mlruns/ # MLflow tracking store
├── Makefile # Pipeline orchestration
├── requirements.txt
└── setup.py
Requires Python 3.12.
# Create and activate a virtual environment
python -m venv nuforc_venv
source nuforc_venv/bin/activate
# Install dependencies
pip install -r requirements.txt
pip install -e .The full pipeline is orchestrated through the Makefile. A typical end-to-end workflow:
# 1. Preprocessing: ingest, NLP feature engineering, analytics, feature generation
make preproc_pipeline
# 2. Train all models (LR, CatBoost variants, text-only, combined)
make train_all_models
# 3. Evaluate models and bootstrap confidence intervals
make eval_all_models
make bootstrap_eval
# 4. Fit SHAP explainer and generate per-report explanations
make model_explaining_training
# Inspect MLflow runs
make mlflow_uiFor inference on a new batch of reports:
make preproc_pipeline_inferenceRun make help for a full list of available targets.
Source reports come from the National UFO Reporting Center. Note that the NUFORC site renders its tables via a JavaScript wpDataTables plugin, so direct pandas.read_html() does not work. Ingestion iterates the static per-month subindex pages at nuforc.org/ndx/?id=event with rate limiting.
Raw and processed data files are gitignored.
Data Science Dynamics: datasciencedynamics.com
- Posard, M. N., Gromis, A., & Lee, M. (2023). Not the X-Files: An Analysis of UFO Reporting in the United States. RAND Corporation. https://www.rand.org/pubs/research_reports/RRA2475-1.html
- Medina, R. M., Brewer, S. C., & Kirkpatrick, S. M. (2023). An environmental analysis of public UAP sightings and sky view potential. Scientific Reports, 13, 22213. https://doi.org/10.1038/s41598-023-49527-x
- National UFO Reporting Center: nuforc.org
Released under the MIT License. Copyright (c) 2026 Leon Shpaner and Oscar Gil.

