This project implements a fraud detection system for ACH (Automated Clearing House) transactions between financial institutions, such as San Francisco Credit Union and Plaid third-party bank accounts. The system validates customer accounts and determines if they are legitimate based on various criteria such as names, emails, phone numbers, and nicknames. The algorithm uses fuzzy matching techniques (Levenshtein Distance) and nickname mappings to handle real-world data inconsistencies.
- Compares customer records with third-party bank accounts.
- Matches names, emails, and phone numbers with flexibility for minor variations.
- Handles typos and alternative spellings using Levenshtein Distance.
- Incorporates nickname matching for names like "Cy" → "Cyril."
- Requires at least two matching fields (e.g., name, email, phone) to validate an account.
- Saves fraud detection results to an output file for further analysis.
- Results: src/main/resources/fraud-detection-report.txt
- Java (version 8 or higher).
- Jackson for JSON file parsing, writing and entity object mapping.
- Apache Commons Text for Levenshtein Distance, and string matching and similarity.
- Maven
The following files are required for input data:
- san-francisco-credit-union-customers.json: Contains customer bank account records for San Francisco Credit Union.
- plaid-third-party-banks.json: Contains Plaid third-party bank account records.
- Adjust Matching Criteria: Modify thresholds for Levenshtein Distance or the number of required matches.
- Nickname Mapping: Add more nicknames and customizations for nickname matches.
- Data Sources: Replace input files with APIs or database queries for dynamic data handling.
- Scalability: Use parallel processing for large datasets. Integrate with distributed systems like Apache Spark for batch processing.
- Enhanced Matching: Add geolocation or account history for fraud detection. Include configurable weights for different matching criteria.
- Real-Time Detection: Extend to handle real-time ACH transactions using Kafka or RabbitMQ.
- Metadata: Use metadata for validity of accounts.
Author: Prachi Shah @ https://www.linkedin.com/in/prachisms/
P.S. The default copyright laws apply.