Fraud Detection in Online Banking Onboarding Using Advanced Graph Neural Networks

Overview

This project presents a comprehensive approach to detecting fraudulent behavior during the customer onboarding process of an online bank. We simulate a realistic onboarding workflow with 12 sequential steps, where customers are characterized by their processing times at each step. The dataset is synthetically generated to mimic both legitimate and fraudulent onboarding patterns with subtle differences and occasional anomalies.

A graph is constructed based on customer similarities using a k-Nearest Neighbors (k-NN) approach. The edges of this graph are weighted with a Gaussian kernel to capture the decaying similarity as a function of Euclidean distance. An advanced Graph Convolutional Network (GCN) is then employed to leverage both individual feature patterns and relational information, thereby accurately distinguishing between legitimate and fraudulent customers.

Project Details

Problem Statement

Online banks must detect and prevent fraud during the customer onboarding process. Fraudulent customers tend to manipulate processing times by speeding through the steps or introducing irregularities in timing. Our goal is to develop a graph-based model that not only considers the individual timing features across multiple steps but also incorporates the relational context among customers.

Data Simulation

Number of Customers: 500
Onboarding Steps: 12
Legitimate Customers: Modeled using step-specific normal distributions (mean values between 9 and 11 minutes; variance between 1.5 and 2.5 minutes).
Fraudulent Customers: Simulated with slightly lower processing times (offset by 0.5 to 1.5 minutes) and additional probability of extreme outliers to mimic anomalous behavior.
Normalization: Z-score normalization is applied to all features.

Graph Construction

Method: k-Nearest Neighbors (k=10)
Edge Weighting: Each edge weight is computed as exp(-distance), capturing the similarity decay over Euclidean distance.
Graph Characteristics: The graph is undirected and represents the relational structure among customer behaviors.

GNN Model Architecture

Our advanced GCN model includes:

Three Graph Convolutional Layers:
- Layer 1: 64 hidden units, followed by Batch Normalization, ReLU activation, and dropout.
- Layer 2: 32 hidden units with similar normalization, activation, and dropout, plus a residual connection from the first layer to aid gradient flow.
- Layer 3: Outputs class scores (fraudulent vs. legitimate).
Loss Function: Cross-entropy loss.
Optimization: Adam optimizer with a learning rate of 0.01.
Evaluation Metrics: Training and test accuracy, ROC curve, and AUC.

Training and Evaluation

The model is trained for 200 epochs. During training, both loss and accuracy are tracked. For a comprehensive evaluation, the following visualizations are generated:

Training Loss Curve
Training Accuracy Curve
ROC Curve (with AUC) on the Test Set
t-SNE Visualization of Node Embeddings (from the first hidden layer)
Customer Similarity Graph (visualizing the constructed graph)

These plots are saved as graph1.png through graph5.png.

Visualizations

Below are the key plots generated by the project:

Graph 1: Training Loss Curve
Graph 2: Training Accuracy Curve
Graph 3: ROC Curve for Fraud Detection
Graph 4: t-SNE Visualization of Customer Embeddings
Graph 5: Customer Similarity Graph

Requirements

Python 3.7+
PyTorch
PyTorch Geometric
NumPy
Matplotlib
scikit-learn
NetworkX

Running the Code

The main script Fraud Detection GNN.py executes the entire pipeline from data generation to model training and visualization. To run the project:

python Fraud Detection GNN.py

Conclusion

This project demonstrates the efficacy of Graph Neural Networks in detecting fraudulent patterns during the online banking onboarding process. By integrating relational data from a k-NN graph and leveraging advanced GCN techniques, the model captures subtle behavioral discrepancies between legitimate and fraudulent customers. The various plots provide an in-depth analysis of model performance and offer insights into the underlying data structure.

Future Work

Model Enhancements: Experiment with different GNN architectures and hyperparameter tuning.
Real-world Data: Extend the approach to real banking datasets for practical deployment.
Additional Features: Incorporate additional customer features and contextual data to further improve fraud detection accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Fraud Detection GNN.py		Fraud Detection GNN.py
README.md		README.md
graph1.png		graph1.png
graph2.png		graph2.png
graph3.png		graph3.png
graph4.png		graph4.png
graph5.png		graph5.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraud Detection in Online Banking Onboarding Using Advanced Graph Neural Networks

Overview

Project Details

Problem Statement

Data Simulation

Graph Construction

GNN Model Architecture

Training and Evaluation

Visualizations

Requirements

Running the Code

Conclusion

Future Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection in Online Banking Onboarding Using Advanced Graph Neural Networks

Overview

Project Details

Problem Statement

Data Simulation

Graph Construction

GNN Model Architecture

Training and Evaluation

Visualizations

Requirements

Running the Code

Conclusion

Future Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages