- Birthday Paradox & Base62
- Caching: In-Memory vs Disk
- CDN with Edge Computing
- Scaling Strategies
- Uploading Large Files
- File Sync Agents
- File/Data Security
- Consistency & Transactions
- Pagination
- Latency & Caching Strategies
- Collisions are more likely than expected. Only 23 people are needed for a >50% chance two share a birthday.
- For space size N, expect 50% collision chance after about ( \sqrt{N} ) random picks.
| Length (power of 62) | Combinations |
|---|---|
| 6 | 56 billion |
| 7 | 3.5 trillion |
| 8 | 218 trillion |
| Storage | Latency |
|---|---|
| Memory | 100 nanoseconds (0.0001 ms) |
| SSD | 0.1 ms |
| HDD | 10 ms |
| Storage | IOPS (Reads/sec) |
|---|---|
| Memory | Millions |
| SSD | ~100,000 |
| HDD | 100 - 200 |
- CDN = geographically distributed PoPs (Points of Presence).
- Edge Computing = logic execution closer to user via Cloudflare Workers or AWS Lambda@Edge.
- Benefits: Reduced latency, faster TTFB, global reach.
- Cautions: Higher cost with scale; limits on memory, execution time.
- Note: Be mindful of cache invalidation and consistency. TTL and CDC can help limit staleness.
- Database:
- Replication
- Sharding/Partitioning
- Backups
- Services:
- Horizontal scaling
- Auto-scaling groups
- Distributed Cache:
- Standalone vs Sentinel vs Cluster
- Counters:
- Counter batching to reduce network overhead
- Write Optimization:
- Batch writes
- Design Patterns:
- CQRS (Command Query Responsibility Segregation)
- Sidecar Pattern
- Bulkhead Pattern
- Circuit Breaker
- Use Blob/Object storage with pre-signed URLs for direct upload/download.
- Use event triggers (e.g., S3 triggers Lambda) for post-upload actions.
- Prefer Chunking / Multipart upload for:
- Fault tolerance
- Parallelism
- Data integrity (via ETag + Part Number)
- Watch out for:
- Large file timeouts (e.g., 50 GB on 100 Mbps = ~1.1 hrs)
- Web server limits (e.g., NGINX default 2GB)
- Consider compression: GZip, Brotli, ZStandard — better for text files.
| Platform | Utility |
|---|---|
| Linux | fswatch, inotify |
| macOS | FsEvents |
| Windows | FileSystemWatcher |
- Encryption at Rest
- Encryption in Transit
- Access Control:
- RBAC (Role-Based Access Control)
- IAM roles
- KMS, Vault
Write-centric considerations:
- Single Database (best)
- SAGA pattern — with compensating transactions
- Distributed Lock + 2PC
- Application-level Lock (e.g., Mutex)
Other notes:
- Use Redis SETNX for distributed locks
- 2PC for cross-shard transactions (may need pre-locking)
- If race conditions rare: use optimistic concurrency (MVCC)
- If possible, normalize schema to a single store to avoid complexity
- Use lazy loading when dataset is small or partially rendered on demand.
- Offset Pagination:
- Simple but problematic under frequent inserts (can cause duplicate/missing rows).
- Cursor-Based Pagination:
- Use timestamp + unique ID to avoid duplicates.
- Monotonic counters also work but prevent page jumps (e.g., page 50).
- GraphQL Relay uses base64 cursors.
Latency mitigation techniques:
- Caching: Redis, Memcached, CDN
- TTL: Reduce staleness window
- CDC: Push updates to cache
- Geo-Sharding: Serve from nearest region
- Pooling: Connection/thread pooling
- Prefetching & lazy loading
| Strategy | Description |
|---|---|
| Cache Aside | App reads/writes DB, manages cache manually |
| Read Through | Cache auto-loads data on miss |
| Write Through | Writes go to cache and DB simultaneously |
| Write Back | Write to cache only; DB syncs later (risk of data loss on cache failure) |
Eviction Policies:
- LRU (Least Recently Used)
- LFU (Least Frequently Used)
- FIFO (First In First Out)
More sections to be added: CAP theorem, Rate limiting, Leader election, Observability, etc.