Fusing Social Media Text and Imagery for Real-Time Humanitarian Intelligence
Gandhi Institute of Technology and Management, Hyderabad
Author: Aashwika Khurana | Mentor: Dr. V. Sireesha, Head of Departent of AI and DS (paper in progress, to be published soon)
Every year, disasters kill tens of thousands of people — and the information needed to save more lives is sitting in the photos people post from rooftops during floods, in the tweets sent before power cuts out. MMDRS is an end-to-end platform that reads that information fast enough, accurately enough, and cheaply enough to actually help.
MMDRS processes raw social media posts through a dual-encoder fusion architecture (BERT + ResNet-50), simultaneously classifies events across four operationally relevant tasks, detects when text and images contradict each other, and translates predictions into Expected Operational Risk (EOR) scores — telling coordinators not just what the model thinks, but what it costs to be wrong.
Overall system performance: 82–85% accuracy, macro-F1 0.81–0.83 across all four tasks, 276ms end-to-end inference latency.
Current disaster AI systems:
- Process text or images — not both together, despite people almost always sending both
- Assume text and image describe the same event — but ~12.4% of posts use recycled imagery
- Treat all prediction errors equally — but missing a severe disaster is categorically different from a false alarm
- Train globally and deploy everywhere — producing models calibrated to no region specifically
MMDRS addresses all four gaps within a single deployable platform.
Raw Tweet (text + image)
│
▼
┌─────────────────────────────────────────┐
│ INGESTION & TRIAGE │
│ Tweepy filtered stream → TF-IDF LR │
│ triage classifier (recall ≥ 0.97) │
└──────────────┬──────────────────────────┘
│
┌───────┴────────┐
▼ ▼
┌─────────────┐ ┌─────────────────┐
│ BERT-base │ │ ResNet-50 │
│ text encoder│ │ image encoder │
│ → t ∈ R^768 │ │ → v ∈ R^2048 │
│ (fine-tuned │ │ (MEDIC │
│ on Disaster │ │ pre-trained) │
│ Tweets) │ │ │
└──────┬──────┘ └───────┬─────────┘
└────────┬─────────┘
▼
┌───────────────────────────────────────┐
│ FUSION LAYER │
│ Concat → m ∈ R^2816 │
│ → Linear(2816→512) + LayerNorm + GELU│
│ → z ∈ R^512 (shared representation) │
└───────────────┬───────────────────────┘
│
┌───────────┼───────────┬───────────┐
▼ ▼ ▼ ▼
[Disaster [Informative [Humanitarian [Damage
Type] ness] Impact] Severity]
5 classes binary 4 classes 3 levels
└───────────┴───────────┴───────────┘
│
▼
┌───────────────────────────────────────┐
│ UNCERTAINTY-AWARE DECISION SUPPORT │
│ • Temperature scaling (T=1.4) │
│ • Prediction entropy H(x) │
│ • Cross-modal dissonance score │
│ → Expected Operational Risk (EOR) │
└───────────────┬───────────────────────┘
│
▼
┌───────────────────────────────────────┐
│ FLASK API + SUPABASE + DASHBOARD │
│ Real-time geospatial map (Leaflet.js)│
│ Subscriber alert pipeline (email) │
│ NGO-facing EOR action tiers │
└───────────────────────────────────────┘
🌐 https://mmdrs.lovable.app/ — Interactive demo with simulated real-time classification and EOR scoring, a depiction of what it intends to look like.
| Task | Classes | Operational Purpose |
|---|---|---|
| Disaster Type | Earthquake, Hurricane, Flood, Wildfire, Landslide | Route to domain-specific responders |
| Informativeness | Informative / Not Informative | Filter noise from signal |
| Humanitarian Impact | Affected individuals, Infrastructure damage, Rescue operations, Not humanitarian | Match responder type to need |
| Damage Severity | Severe / Mild / Little or None | Prioritize dispatch |
Task loss weights: Type=1.0, Informativeness=0.8, Humanitarian=1.3, Severity=1.2 — upweighting operationally critical tasks.
- Dual-encoder multimodal fusion — BERT + ResNet-50 through a learned 512-dimensional projection, with ablation confirming +4 macro-F1 over raw concatenation
- Four-head multi-task learning — joint training provides cross-task positive transfer; rescue operations class improves from F1=0.63 to 0.71 with no additional data
- Semantic dissonance detection — cosine similarity in shared 256-dim alignment space; flags ~12.4% of posts with recycled/mismatched imagery (precision=0.78, recall=0.74)
- Temperature-scaled calibration — T=1.4 reduces ECE from 0.089 → 0.042; essential before probability-based risk computation
- Expected Operational Risk (EOR) — replaces raw confidence with cost-weighted risk scores grounded in NGO-defined error cost matrix
- Regional contextualization — Telangana-first deployment with 143 region-specific keywords, mandal-level geospatial precision, localized EOR cost matrix
| Dataset | Modality | Size | Role in MMDRS |
|---|---|---|---|
| CrisisMMD v2.0 | Text + Image | 16,058 tweets / 18,082 images | Primary multimodal training & evaluation |
| MEDIC | Image | 71,198 images | ResNet-50 disaster-domain pre-training |
| Disaster Tweets Corpus | Text | ~77,000 tweets | BERT domain adaptation |
| Configuration | Accuracy | Macro-F1 | ECE |
|---|---|---|---|
| BERT Only | 74.3% | 0.72 | — |
| ResNet-50 Only | 71.8% | 0.69 | — |
| Late Fusion (baseline) | 79.1% | 0.77 | — |
| MMDRS Full (Ours) | 84.2% | 0.83 | 0.042 |
Full ablation in paper.
| Stage | Mean (ms) | P95 (ms) |
|---|---|---|
| Ingestion + triage | 18 | 32 |
| Text preprocessing + BERT | 22 | 38 |
| Image download + preprocessing | 85 | 210 |
| BERT forward pass | 48 | 60 |
| ResNet-50 forward pass | 31 | 42 |
| Fusion + task heads | 8 | 12 |
| Temperature scaling + EOR | 2 | 4 |
| Database write | 62 | 105 |
| End-to-end total | 276 | 503 |
MMDRS/
├── src/
│ ├── ingestion/ # Tweepy stream, triage classifier
│ ├── encoders/ # BERT text encoder, ResNet-50 image encoder
│ ├── fusion/ # Concatenation + projection layer
│ ├── heads/ # Four classification heads
│ ├── dissonance/ # Cross-modal semantic dissonance detection
│ ├── risk/ # EOR computation, temperature scaling
│ ├── api/ # Flask REST API
│ └── dashboard/ # Frontend (HTML/CSS/JS + Leaflet.js)
├── scripts/
│ ├── train.py # Full joint multi-task training
│ ├── evaluate.py # Evaluation on held-out test sets
│ ├── calibrate.py # Temperature scaling post-hoc calibration
│ └── preprocess.py # Dataset preprocessing pipeline
├── configs/
│ └── config.yaml # All hyperparameters (matches Table 3 in paper)
├── data/
│ ├── raw/ # Original datasets (not committed — see Data section)
│ └── processed/ # Preprocessed tensors and splits
├── models/
│ ├── checkpoints/ # Training checkpoints
│ └── weights/ # Final trained weights (not committed — see below)
├── outputs/
│ ├── shap/ # SHAP text attribution visualizations
│ ├── gradcam/ # GradCAM image heatmaps (planned)
│ └── logs/ # Training logs and metrics
├── notebooks/
│ ├── 01_data_exploration.ipynb
│ ├── 02_training_walkthrough.ipynb
│ └── 03_shap_visualization.ipynb
├── tests/ # Unit tests
├── docs/ # Paper and architecture diagrams
└── requirements.txt
git clone https://github.com/YOUR_USERNAME/MMDRS.git
cd MMDRS
pip install -r requirements.txtCrisisMMD v2.0 and MEDIC are publicly available research datasets:
- CrisisMMD: https://crisisnlp.qcri.org/crisismmd
- MEDIC: https://crisisnlp.qcri.org/medic
- Disaster Tweets: https://github.com/firojalam/crisis_datasets_benchmarks
Download and place under data/raw/. Run python scripts/preprocess.py to generate processed splits.
# Step 1: BERT domain adaptation
python scripts/train.py --stage bert_pretrain --config configs/config.yaml
# Step 2: ResNet-50 MEDIC pre-training
python scripts/train.py --stage resnet_pretrain --config configs/config.yaml
# Step 3: Full joint multi-task training
python scripts/train.py --stage joint --config configs/config.yaml
# Step 4: Temperature calibration (post-hoc)
python scripts/calibrate.py --config configs/config.yamlpython scripts/evaluate.py --config configs/config.yaml --split testMMDRS deploys first at Telangana state scale — a deliberate regional-first strategy. Regional adaptations include:
- Geographic bounding box: 17.0–19.9°N, 77.3–81.3°E
- 143 Telangana-specific disaster keywords including Telugu vocabulary and district names
- Mandal-level geospatial precision (sub-district granularity)
- Localized EOR cost matrix: flooding false negatives weighted highest due to rapid inundation timelines
Evaluated across three tabletop disaster exercises with two Hyderabad-based NGO partners.
MMDRS replaces raw softmax confidence with EOR scores that reflect asymmetric real-world error costs:
EOR(x, a) = Σ_c P_calibrated(c|x) · Cost(c, a)
Dashboard action tiers:
- 🔴 High EOR — "Immediate human review required"
- 🟡 Medium EOR — "Monitor and verify"
- 🟢 Low EOR — "Log for record"
This was specifically requested by NGO partners who found raw confidence percentages gave no guidance on what to do.
| Component | Status |
|---|---|
| Architecture design & paper | ✅ Complete |
| Dataset preprocessing scripts | 🔄 In progress |
| BERT + ResNet-50 encoders | 🔄 In progress |
| Fusion layer | 🔄 In progress |
| Multi-task heads | 🔄 In progress |
| Dissonance detection | 🔄 In progress |
| EOR computation | 🔄 In progress |
| Flask API | 🔄 In progress |
| SHAP visualizations | 🔄 In progress |
| Dashboard | 🔄 In progress |
| Telangana pilot evaluation | ✅ Complete (tabletop exercises) |
This is an active research project. Code is being cleaned and documented for public release.
@article{khurana2025mmdrs,
title={MultiModal Disaster Response System (MMDRS): Fusing Social Media Text and
Imagery for Real-Time Humanitarian Intelligence with Regional
Contextualization, Cost-Sensitive Risk Modeling, and SDG Alignment},
author={Khurana, Aashwika},
institution={Gandhi Institute of Technology and Management, Hyderabad},
year={2025}
note={Manuscript under preparation}
}CrisisNLP/QCRI research group for dataset provision. NGO evaluation partners in Hyderabad. Open-source communities behind PyTorch, HuggingFace Transformers, Supabase, and Leaflet.js.