Skip to content

Mhrnqaruni/Fraud-Detection-GNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fraud Detection in Online Banking Onboarding Using Advanced Graph Neural Networks

Overview

This project presents a comprehensive approach to detecting fraudulent behavior during the customer onboarding process of an online bank. We simulate a realistic onboarding workflow with 12 sequential steps, where customers are characterized by their processing times at each step. The dataset is synthetically generated to mimic both legitimate and fraudulent onboarding patterns with subtle differences and occasional anomalies.

A graph is constructed based on customer similarities using a k-Nearest Neighbors (k-NN) approach. The edges of this graph are weighted with a Gaussian kernel to capture the decaying similarity as a function of Euclidean distance. An advanced Graph Convolutional Network (GCN) is then employed to leverage both individual feature patterns and relational information, thereby accurately distinguishing between legitimate and fraudulent customers.

Project Details

Problem Statement

Online banks must detect and prevent fraud during the customer onboarding process. Fraudulent customers tend to manipulate processing times by speeding through the steps or introducing irregularities in timing. Our goal is to develop a graph-based model that not only considers the individual timing features across multiple steps but also incorporates the relational context among customers.

Data Simulation

  • Number of Customers: 500
  • Onboarding Steps: 12
  • Legitimate Customers: Modeled using step-specific normal distributions (mean values between 9 and 11 minutes; variance between 1.5 and 2.5 minutes).
  • Fraudulent Customers: Simulated with slightly lower processing times (offset by 0.5 to 1.5 minutes) and additional probability of extreme outliers to mimic anomalous behavior.
  • Normalization: Z-score normalization is applied to all features.

Graph Construction

  • Method: k-Nearest Neighbors (k=10)
  • Edge Weighting: Each edge weight is computed as exp(-distance), capturing the similarity decay over Euclidean distance.
  • Graph Characteristics: The graph is undirected and represents the relational structure among customer behaviors.

GNN Model Architecture

Our advanced GCN model includes:

  • Three Graph Convolutional Layers:
    • Layer 1: 64 hidden units, followed by Batch Normalization, ReLU activation, and dropout.
    • Layer 2: 32 hidden units with similar normalization, activation, and dropout, plus a residual connection from the first layer to aid gradient flow.
    • Layer 3: Outputs class scores (fraudulent vs. legitimate).
  • Loss Function: Cross-entropy loss.
  • Optimization: Adam optimizer with a learning rate of 0.01.
  • Evaluation Metrics: Training and test accuracy, ROC curve, and AUC.

Training and Evaluation

The model is trained for 200 epochs. During training, both loss and accuracy are tracked. For a comprehensive evaluation, the following visualizations are generated:

  1. Training Loss Curve
  2. Training Accuracy Curve
  3. ROC Curve (with AUC) on the Test Set
  4. t-SNE Visualization of Node Embeddings (from the first hidden layer)
  5. Customer Similarity Graph (visualizing the constructed graph)

These plots are saved as graph1.png through graph5.png.

Visualizations

Below are the key plots generated by the project:

  • Graph 1: Training Loss Curve
    Training Loss Curve

  • Graph 2: Training Accuracy Curve
    Training Accuracy Curve

  • Graph 3: ROC Curve for Fraud Detection
    ROC Curve

  • Graph 4: t-SNE Visualization of Customer Embeddings
    t-SNE Visualization

  • Graph 5: Customer Similarity Graph
    Customer Similarity Graph

Requirements

Running the Code

The main script Fraud Detection GNN.py executes the entire pipeline from data generation to model training and visualization. To run the project:

python Fraud Detection GNN.py

Conclusion

This project demonstrates the efficacy of Graph Neural Networks in detecting fraudulent patterns during the online banking onboarding process. By integrating relational data from a k-NN graph and leveraging advanced GCN techniques, the model captures subtle behavioral discrepancies between legitimate and fraudulent customers. The various plots provide an in-depth analysis of model performance and offer insights into the underlying data structure.

Future Work

  • Model Enhancements: Experiment with different GNN architectures and hyperparameter tuning.
  • Real-world Data: Extend the approach to real banking datasets for practical deployment.
  • Additional Features: Incorporate additional customer features and contextual data to further improve fraud detection accuracy.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages