A polished Streamlit app for fake news classification with a single-page tabbed UX, URL content extraction, and profile-based model training on the WELFake dataset.
- Single-page tab navigation (
Home,Text Analyzer,URL Analyzer,Model Insights) inapp.py - Text classification with confidence scoring
- URL analysis using both article title and main body content
- WELFake training profiles:
quick: stratified subset for faster trainingfull: full dataset training
- Automatic dataset download from Google Drive when training starts (no large CSV committed to GitHub)
- Profile-specific saved artifacts with active profile switching
- Model metrics dashboard (Accuracy, Precision, Recall, F1, ROC-AUC, confusion matrix)
- Python
- Streamlit
- scikit-learn
- pandas / numpy
- matplotlib
- BeautifulSoup + requests
fake-news-streamlit-app/
|-- app.py
|-- artifacts/
| |-- datasets/
| | `-- WELFake_Dataset.csv # downloaded at runtime on first training run
| |-- fake_news_model_quick.joblib
| |-- fake_news_model_full.joblib
| |-- tfidf_vectorizer_quick.joblib
| |-- tfidf_vectorizer_full.joblib
| |-- training_metrics_quick.json
| |-- training_metrics_full.json
| |-- training_metrics_active.json
| `-- active_profile.txt
|-- src/
| |-- models/
| | |-- predict.py
| | `-- train.py
| |-- ui/
| | |-- components.py
| | `-- theme.py
| `-- utils/
| `-- web_scraper.py
|-- requirements.txt
`-- README.md
- Clone the repo
git clone <your-repo-url>
cd fake-news-streamlit-app- Install dependencies
pip install -r requirements.txt- Run the app
streamlit run app.pyUse the Model Insights tab and click Train on WELFake with either quick or full selected.
If the dataset is not already cached locally, the app downloads it automatically from Google Drive:
https://drive.google.com/file/d/13lcNYSvVfJhC5xl-84k5AcHvNVnKiI1T/view?usp=drive_link
The downloaded file is cached at:
artifacts/datasets/WELFake_Dataset.csv
You can override the dataset URL in deployment environments using:
-
WELFAKE_DATASET_URL -
Quick profile writes:
artifacts/fake_news_model_quick.joblibartifacts/tfidf_vectorizer_quick.joblibartifacts/training_metrics_quick.json
-
Full profile writes:
artifacts/fake_news_model_full.joblibartifacts/tfidf_vectorizer_full.joblibartifacts/training_metrics_full.json
The active profile is tracked in:
artifacts/active_profile.txtartifacts/training_metrics_active.json
Inference always uses the active profile artifacts.
The WELFake label mapping used in training/inference is:
0 = REAL1 = FAKE
- This repo is configured to ignore the
artifacts/folder by default so trained model files are not uploaded unintentionally. WELFake_Dataset.csvdoes not need to be committed. It is downloaded at runtime from Google Drive during training.- Ensure the Google Drive file is shared as Anyone with the link (Viewer) for Streamlit Cloud deployments.
MIT