Skip to content

oguzhangoktas/realtime-fx-rate-processor

Repository files navigation

FX Rates LED Display - Complete Solution

Real-time FX rate processing system with SQL batch and Kafka streaming solutions


📋 Problem Statement

Business Need: Display real-time FX rates on an LED screen with percentage change vs. yesterday's 5PM NY reference

Data Source: High-frequency tick data (millisecond updates)

  • 5 currency pairs initially, scaling to 300+ pairs
  • Sample format: event_id, event_time, ccy_couple, rate

Technical Challenge:

  • Part A: Hourly batch job (SQL) for 5 pairs
  • Part B: Minutely updates (SQL optimized) for 300 pairs
  • Part C: Real-time streaming solution with sub-second latency

Output Format:

"EUR/USD",1.08081,"-0.208%"

Key Constraints:

  • Only show "active" rates (last received + <30 seconds old)
  • Reference: Yesterday 5PM New York time
  • Must handle timezone conversions (EST/EDT)

📝 Assignment Requirements

Part A: Hourly SQL Batch (5 Pairs)

Write a SQL query that runs every hour to:

  • Calculate current FX rate for each of 5 currency pairs
  • Compare against yesterday's 5PM NY closing rate
  • Show percentage change
  • Only include active rates (last update <30 seconds ago)

Part B: Minutely SQL Batch (300 Pairs)

Optimize Part A to:

  • Run every 1 minute (vs hourly)
  • Handle 300 currency pairs (vs 5)
  • Execute in <5 seconds (scalability challenge)

Part C: Real-Time Streaming (Optional)

Build a streaming solution that:

  • Processes FX rates in real-time (no batch delays)
  • Updates LED display continuously
  • Handles high-frequency data (1000+ msg/sec)

📊 Sample Data Format

Input: High-frequency FX tick data

event_id,event_time,ccy_couple,rate
288230383844089590,1704146399036,EURUSD,1.080790000000000
288230383844089594,1704146399038,NZDUSD,0.616420000000000
288230383844089658,1704146400001,EURUSD,1.080780000000000

Output: LED display format

ccy_couple,rate,change
"EUR/USD",1.08081,"-0.208%"
"GBP/USD",1.26230,"+0.038%"

⚠️ Critical Business Rules

  1. Active Rate Definition:

    • Last received rate for a currency pair
    • Not older than 30 seconds
    • If no active rate → No output
  2. Reference Time:

    • Yesterday's 5PM New York Time (EST/EDT)
    • Handles daylight saving time transitions
    • Industry standard for FX settlement
  3. Timezone Handling:

    • Input: UTC epoch milliseconds
    • Calculation: America/New_York (EST/EDT aware)
    • Critical for accurate daily references

🎯 Project Overview

Three complete solutions for processing FX rates with different performance characteristics:

Solution Frequency Pairs Execution Time Technology
Part A Hourly 5 <10s SQL Batch
Part B Minutely 300 2-5s Optimized SQL (10-15x faster!)
Part C Real-time 300+ <100ms Kafka Streaming

📂 Project Structure

fx_rates_solution/
├── sql/                                    # SQL Solutions (Parts A & B)
│   ├── solution_part_a.sql                 # Hourly batch (5 pairs)
│   ├── solution_part_a_test.sql            # Test version (relaxed timing)
│   ├── solution_part_b.sql                 # Minutely optimized (300 pairs)
│   ├── solution_part_b_test.sql            # Test version (relaxed timing)
│   ├── create_tables_part_b.sql            # Part B schema setup
│   └── update_yesterday_rates.sql          # Part B reference rates (production)
│
├── streaming/                              # Real-Time Solution (Part C)
│   ├── fx_rates_streaming.py               # Main processor
│   ├── led_display_visual.py               # Visual LED display
│
├── testing/                                # Test Utilities
│   ├── generate_mock_data.py               # Mock data generator
│   ├── load_reference_rates.py             # Reference rates loader
│   └── DOCKER_TESTING_GUIDE.md             # Complete testing guide
│
├── docker-compose.yml                      # Kafka + PostgreSQL infrastructure
├── requirements.txt                        # Python dependencies
├── README.md                               # This file```
├── DESIGN_DECISIONS.md                     # Design decisions
├── rates_sample.csv                        # Sample data

---

## 🚀 Quick Start (5 Minutes)

### Prerequisites

```bash
# Required
- Docker & Docker Compose
- Python 3.8+

# Verify installations
docker --version
python --version

Install & Test

# 1. Install Python dependencies
pip install -r requirements.txt

# 2. Start services
docker-compose up -d && sleep 30

# 3. Generate test data (with YESTERDAY + CURRENT timestamps!)
cd testing
python generate_mock_data.py --mode full
cd ..

# 4. Test Part A (SQL Batch)
docker-compose exec postgres createdb -U postgres fxdb
docker-compose exec postgres psql -U postgres -d fxdb -c "CREATE TABLE fx_rates (event_id BIGINT PRIMARY KEY, event_time BIGINT, ccy_couple VARCHAR(6), rate DECIMAL(18,12));"
docker cp testing/mock_rates_current.csv postgres:/tmp/
docker-compose exec postgres psql -U postgres -d fxdb -c "\COPY fx_rates FROM '/tmp/mock_rates_current.csv' CSV HEADER;"
docker-compose exec -T postgres psql -U postgres -d fxdb < sql/solution_part_a.sql

# Expected: 5 currency pairs with percentage changes! ✅

For complete testing guide: See testing/DOCKER_TESTING_GUIDE.md


📊 Solution Details

Part A: Hourly Batch Processing

Purpose: Calculate FX rates vs yesterday 5PM NY reference (5 pairs)

Key Features:

  • ✅ SQL window functions (ROW_NUMBER)
  • ✅ Timezone-aware (EST/EDT with DST)
  • ✅ Active rate detection (<30 seconds)
  • ✅ Yesterday 5PM NY reference calculation

Files:

  • sql/solution_part_a.sql - Production version
  • sql/solution_part_a_test.sql - Test version (accepts 1-hour old data)

Output Format:

 ccy_couple |  rate   |   change   
------------+---------+------------
 "EUR/USD"  | 1.08077 | "+0.002%"
 "GBP/USD"  | 1.26230 | "-0.038%"

Part B: Optimized Minutely Processing

Purpose: High-frequency processing for 300 currency pairs

Key Optimizations:

  1. Table Partitioning - Daily partitions, automatic pruning
  2. Pre-computed Reference Rates - Daily refresh (not every query)
  3. 2-Minute Scan Window - Only recent data (vs full table)
  4. Composite Indexes - Fast lookup on (ccy_couple, event_time DESC)

Performance:

Part A: Full table scan → 30-60 seconds
Part B: Optimized scan → 2-5 seconds
Improvement: 10-15x faster! 🚀

Files:

  • sql/create_tables_part_b.sql - Schema setup (run once)
  • sql/update_yesterday_rates.sql - Reference rates (run daily 5PM)
  • sql/solution_part_b.sql - Main query (run every minute)
  • sql/solution_part_b_test.sql - Test version (accepts 1-hour old data)

Testing Helper:

  • testing/load_reference_rates.py - Loads JSON reference rates to database

Part C: Real-Time Streaming

Purpose: Sub-second FX rate processing with visual LED display

Architecture:

Test Producer → Kafka → Stream Processor → LED Display
    (Python)    (Queue)     (Python)        (Visual)

Key Features:

  • ✅ Event-Driven - Apache Kafka message queue
  • ✅ In-Memory State - No Redis required
  • ✅ Sub-Second Latency - <100ms processing
  • ✅ Visual LED Display - Colored, animated, real-time
  • ✅ Scalable - Handles 1000+ messages/second

Files:

  • streaming/fx_rates_streaming.py - Main processor
  • streaming/led_display_visual.py - Visual LED display
  • streaming/generate_mock_data.py - Generate test data

Testing:

# Terminal 1: Producer
cd testing
python generate_mock_data.py --mode streaming --duration 120

# Terminal 2: Processor
cd streaming
python fx_rates_streaming.py

# Terminal 3: LED Display
cd streaming
python led_display_visual.py

🧪 Testing Infrastructure

Mock Data Generator (UPDATED!)

Problem: Sample data has old timestamps → SQL queries return 0 rows

Solution: testing/generate_mock_data.py generates TWO time windows:

Window 1: Yesterday 5PM ±10 minutes (6,000 rows)
Window 2: Recent 1 hour (180,000 rows)
Total: 186,000 rows

Usage:

cd testing

# Generate everything (recommended)
python generate_mock_data.py --mode full

# Creates:
# - mock_rates_current.csv (186K rows with yesterday + current data)
# - yesterday_rates.json (reference rates extracted from CSV)
# - import_mock_data.sql (SQL import script)

Modes:

--mode full        # Everything (Part A, B, C)
--mode batch       # CSV only (Part A, B)
--mode reference   # JSON only
--mode streaming   # Kafka streaming (Part C)

🔧 Configuration

Database (Parts A & B)

PostgreSQL: 12+
Storage: SSD recommended
Memory: 8GB+ recommended
Timezone: UTC storage, America/New_York calculations

Streaming (Part C)

Kafka: 7.5.0+ (via Docker)
Python: 3.8+
Dependencies: kafka-python, pytz, rich, psycopg2-binary
Topics: fx-rates-input, fx-rates-led-display

📈 Performance Comparison

Metric Part A Part B Part C
Data Scanned Full table 2-min window Streaming
Execution 5-10s 2-5s <100ms
Latency 60 minutes 1 minute <1 second
Updates/Hour 1 60 Continuous
Pairs 5 300 300+

🎓 Key Design Decisions

Why Yesterday 5PM NY?

  • Industry Standard - FX market daily settlement time
  • Consistency - Used for P&L, fixings, regulatory reporting

Why 30-Second Active Window?

  • Data Freshness - FX markets update multiple times/second
  • Connectivity Detection - Identifies feed issues
  • LED Display - Showing stale rates is misleading

Why 2-Minute Scan Window (Part B)?

  • Safety Margins - Clock skew, network delay tolerance
  • Performance - +0.3s execution vs high reliability
  • Trade-off - 4x more data scanned, but still fast

Why In-Memory State (Part C)?

For Assessment:

  • ✅ No external dependencies (except Kafka)
  • ✅ Ultra-fast (microseconds)
  • ✅ Simple code

For Production:

  • Would use Redis for persistence & horizontal scaling

📚 Documentation

Main Guides

  • README.md (this file) - Project overview
  • testing/DOCKER_TESTING_GUIDE.md - Complete step-by-step testing

Code Documentation

  • All SQL files have extensive inline comments
  • Python files include docstrings and explanations

🚀 Deployment

SQL Solutions (Production)

# Setup schema
psql -d fxdb -f sql/create_tables_part_b.sql

# Schedule jobs (cron)
# Minutely query:
* * * * * psql -d fxdb -f solution_part_b.sql

# Daily reference update (5PM NY):
0 17 * * * TZ=America/New_York psql -d fxdb -f update_yesterday_rates.sql

Streaming Solution (Production)

# Start Kafka
docker-compose up -d

# Deploy processor
python streaming/fx_rates_streaming.py --reference-rates yesterday_rates.json

# Deploy LED display
python streaming/led_display_visual.py

✅ Requirements Met

Part A ✅

  • SQL query for 5 currency pairs
  • Hourly execution
  • Active rate detection (<30 seconds)
  • Yesterday 5PM NY reference
  • Percentage change calculation
  • Output: "CCY1/CCY2",rate,"±X.XXX%"

Part B ✅

  • Minutely execution
  • 300 currency pairs support
  • Performance optimizations (10-15x improvement)
  • Design changes documented
  • All optimizations explained

Part C ✅

  • Real-time processing
  • Streaming architecture
  • Sub-second latency (<100ms)
  • Visual LED display
  • Scalable design

🛠️ Technologies

Languages: SQL (PostgreSQL), Python 3.8+

Databases: PostgreSQL 12+ (partitioning, window functions)

Streaming: Apache Kafka 7.5.0, Zookeeper 7.5.0

Python Libraries: kafka-python, pytz, rich, psycopg2-binary

DevOps: Docker, Docker Compose


🎉 Summary

Complete, production-ready solution demonstrating:

  1. SQL Mastery - Batch processing, optimization, partitioning
  2. Streaming Architecture - Event-driven, real-time processing
  3. Performance Engineering - 10-15x improvement Part A → Part B
  4. Production Readiness - Testing, documentation, deployment

All three solutions working and tested! 🚀


📞 Quick Commands Reference

# Start services
docker-compose up -d && sleep 30

# Generate test data
cd testing && python generate_mock_data.py --mode full && cd ..

# Test Part A
docker-compose exec -T postgres psql -U postgres -d fxdb < sql/solution_part_a.sql

# Test Part B
python testing/load_reference_rates.py
docker-compose exec -T postgres psql -U postgres -d fxdb < sql/solution_part_b.sql

# Test Part C (3 terminals)
# T1: python testing/generate_mock_data.py --mode streaming --duration 120
# T2: python streaming/fx_rates_streaming.py
# T3: python streaming/led_display_visual.py

# Stop services
docker-compose down

Last Updated: December 2025
Author: Oğuzhan Göktaş - Data Engineer