Real-time global phishing detection powered by distributed databases, ML risk scoring, and cross-region replication.
In today's digital world, phishing attacks evolve faster than traditional security systems can respond. Centralized databases create single points of failure and geographic blind spots. We built PhishNChips to demonstrate how distributed databases can provide global, real-time threat intelligence.
Unlike traditional phishing detectors relying on centralized storage:
- No single point of failure
- Automatic cross-region replication
- Distributed query execution
- Horizontally scalable architecture
- Resilient to node failure
PhishNChips demonstrates real distributed systems resilience, not just ML-based classification.
PhishNChips creates a geo-distributed network where:
- Browser extensions detect suspicious URLs using ML and heuristics
- Distributed Elasticsearch cluster (3-node cluster simulating US/EU/Asia regions locally via Docker) stores and replicates threat data
- Real-time dashboards visualize global phishing trends
- Automated testing proves fault tolerance and scalability
20-second demo showing the distributed phishing detection system in action
Tech Stack:
- Backend: FastAPI (Python) with ML-based risk scoring
- Database: Elasticsearch 8.11 (3-node distributed cluster)
- Frontend: Browser extension + Kibana dashboards
- Infrastructure: Docker Compose for local simulation
Key Features:
- Distributed 3-Node Cluster - Simulates global deployment locally
- ML Risk Scoring - Neural network + heuristic analysis
- Real-time Visualization - Kibana dashboards for threat monitoring
- Automated Testing - Fault tolerance and scalability validation
- Cross-region Replication - Global threat dissemination
# Clone and setup
git clone https://github.com/deepti-96/PhishNChips-Distributed-Phishing-Intelligence-Network.git
cd PhishNChips
# Start the distributed cluster
make quick-start
# Access points
# Kibana Dashboard: http://localhost:5601
# API Docs: http://localhost:8000/docs
# Elasticsearch: http://localhost:9200Browser Extensions (US, EU, ASIA)
↓ HTTP POST
FastAPI Service
↓ Index
Elasticsearch Cluster (3-node distributed cluster)
↓ Replication
Kibana Dashboard
Distributed Concepts Demonstrated:
- Sharding – Data partitioned across nodes
- Replication (RF=2) – Survives single-node failure
- Distributed Queries – Cross-node aggregations
- Automatic Rebalancing – Replica shard reassignment after node failure
- Horizontal Scalability – Dynamic node addition
- Fault Tolerance: Node termination triggers shard reallocation; cluster recovers in <60s without data loss
- Scalability: Validated with 100K–1M indexed records
- Performance: <200ms ingestion latency under test load
- ML Accuracy: 90%+ evaluated on labeled phishing dataset
- Simulating geo-distribution on a single machine
- Balancing ML model accuracy vs. real-time performance
- Implementing proper cross-region data consistency
- Debugging distributed system failures
- Working distributed phishing detection network
- Real-time threat visualization across regions
- Automated fault tolerance testing
- ML-powered risk scoring with 90%+ accuracy
- Production-ready API with comprehensive docs
- Distributed systems design patterns
- Elasticsearch cluster management
- ML model deployment in production
- Importance of automated testing for complex systems
- Balancing consistency vs. availability in distributed databases
- Deploy to cloud (AWS/Azure multi-region)
- Add more ML models for advanced threat detection
- Integrate with existing security tools
- Real-time alerting system
- Mobile app companion
