A decentralized peer-to-peer chat system implemented in Python using TCP/UDP protocols, featuring automatic network discovery, coordinator election via Bully algorithm, and fault tolerance through heartbeat monitoring.
- Decentralized architecture with no central server
- Automatic peer discovery using UDP multicast
- Leader election using Bully algorithm
- Failure detection and automatic recovery
- Message history synchronization for new nodes
- Thread-safe concurrent operations
- Dynamic UDP port allocation managed by coordinator
- Python 3.7+
- Standard library only
- Local network supporting UDP multicast
cd distributed_chat/chat
# Terminal 1 - First node becomes coordinator
python main.py Lucas 127.0.0.1 5001
# Terminal 2 - Additional nodes join network
python main.py Julia 127.0.0.1 5002
python main.py Maria 127.0.0.1 5003| Command | Description |
|---|---|
<text> |
Send message to all nodes |
/peers |
List all connected nodes (including yourself) |
/history |
View message history |
/status |
Show node information |
/quit |
Exit gracefully |
Note: Any command starting with / that doesn't match the above will show a warning message.
TCP (Transmission Control Protocol)
- Reliable point-to-point communication
- Chat messages between peers
- Control messages (election, coordinator announcements)
- Node join/leave notifications
UDP Multicast
- Network discovery
- Initial join requests
- Heartbeat monitoring
- Quick broadcast to all peers
Multicast Group: 224.0.0.1
Discovery Port: 5007 (fixed)
UDP Port Range: 5100-5200 (dynamic)
TCP Ports: User-defined (unique per node)The coordinator maintains a centralized UDP port pool:
- Available ports: 5100-5200 (101 ports total)
- Allocation: Coordinator assigns from available pool when node joins
- Deallocation: Port returned to pool when node leaves or fails
- Prevents port conflicts and supports up to 101 concurrent nodes
| Type | Description | Transport |
|---|---|---|
JOIN_REQUEST |
Node requests network entry | UDP Multicast |
JOIN_RESPONSE |
Coordinator assigns ID and port | TCP |
NEW_NODE |
Announce new node | TCP |
HEARTBEAT |
Health check | UDP Multicast |
ELECTION |
Initiate coordinator election | TCP |
ELECTION_OK |
Election response | TCP |
COORDINATOR |
New coordinator announcement | TCP |
NODE_LEFT |
Node departure notification | TCP |
| Type | Description | Transport |
|---|---|---|
CHAT_MESSAGE |
User message | TCP |
Node Startup:
- Node binds TCP socket and joins UDP multicast group
- Sends
JOIN_REQUESTvia multicast discovery port (5007) - Waits 3 seconds for coordinator response
- If no response: becomes coordinator (assigns self ID=1)
- If response received: gets assigned ID, UDP port, peer list, and message history
- Reconfigures UDP socket if port changed
- Begins sending heartbeats
Coordinator Processing:
- Receives
JOIN_REQUEST - Acquires lock on port pool
- Allocates available UDP port from pool
- Assigns unique ID (auto-increment)
- Adds new node to peer list
- Sends
JOIN_RESPONSEto new node - Broadcasts
NEW_NODEannouncement to all peers
Trigger Conditions:
- Coordinator failure (no heartbeat for 10 seconds)
- Network partition recovery
Election Process:
- Node detects coordinator failure
- Sends
ELECTIONmessage to all nodes with higher ID - Waits 3 seconds for
ELECTION_OKresponses - If receives
ELECTION_OK: stands down - If no
ELECTION_OK: becomes coordinator - New coordinator broadcasts
COORDINATORannouncement
Example:
Network: Node1(ID=1), Node2(ID=2), Node3(ID=3), Node4(ID=4)
Current Coordinator: Node4
Node4 crashes:
- Node3 sends ELECTION to Node4 (no response)
- Node3 has no higher-ID nodes, becomes coordinator
- Node2 sends ELECTION to Node3 and Node4
- Receives ELECTION_OK from Node3, stands down
- Node1 sends ELECTION to all higher nodes
- Receives ELECTION_OK, stands down
Result: Node3 becomes coordinator
Heartbeat System:
- Send interval: 2 seconds
- Timeout: 10 seconds (5 missed heartbeats)
- Content: Node ID, coordinator status, timestamp
Failure Handling:
- Monitor detects missing heartbeats (>10 seconds)
- Remove failed node from peer list
- Return UDP port to available pool
- If coordinator failed: initiate election
- Update UI with disconnection notice
Sending:
- Create
CHAT_MESSAGEwith sender info, text, and timestamp - Add to local message history
- Broadcast to all peers via TCP (iterates through peers list)
- Trigger UI callback
Receiving:
- Receive
CHAT_MESSAGEfrom peer - Verify not from self (avoid duplication)
- Add to local history with deduplication
- Trigger UI callback
- Sort history chronologically
History Synchronization:
- New nodes receive full history in
JOIN_RESPONSE - Each node maintains independent copy including themselves in peers list
- Deduplication by
sender_name + timestamp
distributed_chat/
├── chat/
│ ├── main.py # CLI interface
│ ├── node.py # P2P implementation
│ └── colors.py # Terminal colors
└── README.md
main.py - User Interface Layer
- Command-line argument parsing
- User input handling and command routing
- Message formatting with timestamps
- Callback registration for UI updates
node.py - Network Layer
- TCP/UDP socket management
- Peer management and tracking
- Coordinator election logic
- Message routing and history
- Failure detection via heartbeats
- Thread-safe operations
- Dynamic UDP port allocation
colors.py - Presentation Layer
- ANSI escape codes for terminal output
Edit constants in node.py:
MULTICAST_GROUP = "224.0.0.1"
MULTICAST_DISCOVERY_PORT = 5007
MULTICAST_PORT_RANGE_START = 5100
MULTICAST_PORT_RANGE_END = 5200Timing parameters (hardcoded in methods):
JOIN_WAIT_TIME = 3 # join_network()
HEARTBEAT_INTERVAL = 2 # _heartbeat_sender()
HEARTBEAT_TIMEOUT = 10 # _heartbeat_monitor()
ELECTION_WAIT_TIME = 3 # start_election()- Single Node: Becomes coordinator with ID=1, peers list contains itself
- Multiple Joins: Each gets unique ID, all visible in
/peersincluding yourself - Message History: New nodes receive complete history
- Invalid Commands: Commands starting with
/that don't exist show warning
- Coordinator Failure: Election starts, highest remaining ID wins
- Multiple Failures: System detects and recovers
- Network Partition: Groups elect coordinators, merge on reconnection
- Rapid Joins/Leaves: System remains stable, no conflicts
- Port Exhaustion: 102nd node fails gracefully
- Concurrent Elections: Only one coordinator emerges
Each node runs 5 concurrent daemon threads:
| Thread | Purpose | Interval |
|---|---|---|
| TCP-Listener | Accept TCP connections | Blocking |
| UDP-Listener | Receive UDP multicast | Blocking |
| Discovery-Listener | Receive JOIN_REQUEST | Blocking |
| Heartbeat-Sender | Send health checks | 2 seconds |
| Heartbeat-Monitor | Detect failures | 5 seconds |
Protected resources (via self.lock):
self.peers- Peer listself.used_multicast_ports- Allocated portsself.available_udp_ports- Port poolself.last_heartbeat- Heartbeat timestampsself.message_history- Chat historyself.election_in_progress- Election state
- UDP multicast limited to local network (no internet routing)
- Each node requires unique TCP port
- Maximum 101 concurrent nodes (UDP port range)
- UDP packet size limit (~4KB)
- No clock synchronization between nodes
- No message encryption or authentication
- Designed for local testing only
| Concept | Implementation |
|---|---|
| Decentralization | No single point of failure; transferable coordinator role |
| Service Discovery | UDP multicast for peer detection |
| Leader Election | Bully algorithm for coordinator selection |
| Failure Detection | Heartbeat-based monitoring |
| State Replication | Synchronized message history |
| Resource Allocation | Centralized UDP port pool management |
| Concurrency | Thread-safe operations with locks |
- TCP vs UDP tradeoffs (reliability vs speed)
- Multicast for one-to-many communication
- Socket programming (bind, listen, connect)
- Dynamic port allocation and management
- Multithreading for I/O operations
- Lock-based synchronization
- Daemon threads for background tasks
- Asynchronous event callbacks
Port already in use
- Choose different TCP port
- Check with:
lsof -i :<port>(Linux/Mac) ornetstat -ano | findstr :<port>(Windows)
No coordinator found
- Verify firewall allows UDP multicast (224.0.0.1)
- Ensure all nodes on same subnet
- Wait full 3 seconds after starting first node
Messages not appearing
- Verify nodes visible in
/peers - Check TCP and UDP ports not blocked
- Review console for error messages
Election failures
- Check network connectivity between nodes
- Verify heartbeats being sent
- Ensure no firewall blocking TCP connections
Each node maintains a self.peers dictionary that:
- Coordinator: Contains all nodes including itself (for consistent counting)
- Non-Coordinator: Receives full peer list from coordinator including itself
- Format:
{node_id: (ip, tcp_port, udp_port, name)}
This ensures:
- Consistent node count across all nodes (
len(node.peers)) - Easy iteration for broadcasting messages
- Simplified peer list display in
/peerscommand
The system uses only 2 send methods:
_send_to(message, ip, port)- Send TCP message to specific address_broadcast(message)- Iterate through all peers and call_send_tofor each
Removed unnecessary wrapper method _send_to_peer for cleaner code.
The coordinator maintains a centralized port pool:
self.available_udp_ports = set(range(5100, 5201)) # 101 portsWhen a node joins:
- If requested port is available → assign it
- If not available → pop from available pool
- Add to
used_multicast_ports - Remove from
available_udp_ports
When a node leaves/fails:
- Return port to
available_udp_ports - Remove from
used_multicast_ports
┌─── Peer List ────────────────────┐
│ Connected:
│ • Lucas
│ • Julia
│ • Maria
│
│ Total nodes: 3
└──────────────────────────────────┘
┌─── Node Status ──────────────────┐
│ Name: Lucas
│ ID: #1
│ Address: 127.0.0.1:5001
│ UDP Port: 5150
│ Coordinator: Lucas
│ Active peers: 2
└──────────────────────────────────┘
┌─── Message History ──────────────┐
│ [14:23:45] Lucas: Hello!
│ [14:23:50] Julia: Hi Lucas!
│ [14:24:01] Lucas: How are you?
└──────────────────────────────────┘
Note: Your own messages appear in green, others in purple.