Skip to content

Official code snippets of paper "Spectral graph convolution networks for microbialite lithology identification based on conventional well logs"

Notifications You must be signed in to change notification settings

KeranLi/Microbialite-Logging

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Spectral Graph Convolution Networks for Microbialite Lithology Identification Based on Conventional Well Logs

Keran Lia,1, Jinmin Songa,*, Han Wanga, Haijun Yanb, Shugen Liua, Yang Lanc,2, Xin Jina, Jiaxin Rena, Lizhou Tiana, Haoshuang Denga, Wei Chena

aState Key Laboratory of Oil and Gas Reservoir Geology and Exploitation, Chengdu University of Technology, Chengdu 610059, China

bResearch Institute of Petroleum Exploration and Development, Beijing, 100083, China

cUniversity College London, Gower Street, London, WC1E 6BT, UK

1Present address: State Key Laboratory of Critical Earth Material Cycling and Mineral Deposits, Frontiers Science Center for Critical Earth Material Cycling, School of Earth Sciences and Engineering, Nanjing University, Nanjing, 210023, China

2Present address: School of Economics and Management, Beihang University, Beijing, 100191, China

*Corresponding authors


Petroleum Science Python PyTorch License

πŸ“„ Paper β€’ 🌐 Project Page β€’ πŸ“Š Dataset β€’ πŸš€ Quick Start


🎯 Overview

This repository provides the official implementation of Spectral Graph Convolutional Networks (GCN) for automated microbialite lithology identification from conventional well logs. Unlike traditional methods that shuffle time-series data (destroying sedimentary sequence information), this approach treats well logs as graph-structured spectral data, preserving both vertical temporal dependencies and inter-log correlations.

πŸ”¬ Key Innovations

  • πŸ•ΈοΈ Graph Representation: Transforms well logs into latent graphs (spectra + adjacency matrix) using GRU and self-attention
  • 🎡 Spectral Processing: Utilizes Graph Fourier Transform (GFT) and Discrete Fourier Transform (DFT) to capture frequency-domain features
  • ⏱️ Sequence Preservation: Maintains depth-series (time-series) order without shuffling, modeling actual sedimentary deposition sequences
  • βš–οΈ Data Balance: Implements SMOTE to handle class imbalance in microbialite distribution
  • πŸ”„ Transfer Learning: Demonstrates fine-tuning strategies for adapting to new formations (Dengying-2, Leikoupo-4Β³) with limited samples

πŸ“Š Performance Highlights

Model Accuracy Precision Recall F1-Score AUC
GCN (Ours) 0.90 0.93 0.94 0.90 0.95
LSTM 0.80 0.79 0.78 0.80 0.78
RNN 0.61 0.60 0.65 0.61 0.72
TCN 0.70 0.69 0.72 0.70 0.78
ANN 0.61 0.50 0.56 0.61 0.58

Results on Dengying Formation (Z2dn4), Moxi Gas Field, Sichuan Basin


πŸ—οΈ Architecture

Workflow: Raw Logs β†’ GRU Encoder β†’ Self-Attention (Adjacency) β†’ GFT/DFT β†’ GLU β†’ Graph Conv β†’ Classification

Core Components

  1. πŸ“ˆ GRU Block: Processes depth-series sequences to generate latent graph representations
  2. πŸ”— Self-Attention: Dynamically constructs adjacency matrices from hidden states (Q, K, V mechanism)
  3. 🌊 Spectral Transform:
    • GFT: Graph Fourier Transform using Laplacian eigen-decomposition
    • DFT: Discrete Fourier Transform for frequency-domain convolution
  4. πŸŽ›οΈ GLU (Gated Linear Unit): Controls information flow in spectral domain
  5. πŸ•ΈοΈ Graph Convolution: Spectral graph convolution with symmetric normalized Laplacian

πŸ“ Dataset

Geological Setting

  • Location: Moxi Gas Field, Central Sichuan Basin, China
  • Formation: 4th Member of Ediacaran Dengying Formation (Z2dn4)
  • Wells: 44 wells (42 for training, 2 preserved for generalization testing)
  • Samples: 10,367 valid data points (after SMOTE augmentation: 12,570)

Well Log Features (8 Curves)

Curve Description Unit Geological Significance
AC Acoustic (Sonic) ΞΌs/m Velocity, porosity indicator
CAL Caliper inch Borehole diameter, caving
CNL Compensated Neutron % Porosity, hydrogen index
DEN Density g/cmΒ³ Bulk density, lithology
GR Gamma Ray API Shale content, clay volume
PE Photoelectric Factor b/e Mineral composition, lithology
RLLD Deep Resistivity Ω·m True formation resistivity
RLLS Shallow Resistivity Ω·m Invasion zone resistivity

Lithological Classes (5 Microbialite Types)

Code Full Name Description Characteristics
MICR Dolomicrite Micritic dolomite Dark, fine-grained, rare structures
SSTR Stratiform Stromatolite Layered microbial mats Parallel laminations, intermittent dark lines
WSTR Wavy Stromatolite Undulating microbial structures Large curvature, semi-circular, porous
THRO Thrombolite Clotted microbial structure Dark clots, diffusing fabric, dissolving pores
SILIS Siliceous Stromatolite Silica-rich microbialite Curved stripes, interlayer quartz, brittle

πŸ“¦ Installation

Prerequisites

  • Python β‰₯ 3.8
  • CUDA β‰₯ 11.3 (optional, for GPU acceleration)
  • 8GB+ RAM

Setup

# Clone repository
git clone https://github.com/KeranLi/GCN-Microbialite-Lithology.git
cd GCN-Microbialite-Lithology

# Install dependencies
pip install -r requirements.txt

# Install package
pip install -e .

Key Dependencies:

torch>=1.11.0
numpy>=1.20.0
pandas>=1.3.0
scikit-learn>=0.24.0
scipy>=1.7.0
imbalanced-learn>=0.8.0  # For SMOTE

πŸš€ Quick Start

1. Data Preparation

Prepare CSV files with the following columns:

Depth,AC,CAL,CNL,DEN,GR,PE,RLLD,RLLS,Lithology
5000.0,55.2,8.5,2.1,2.45,15.0,3.2,100.0,95.0,SSTR
5000.125,56.1,8.6,2.2,2.46,16.2,3.1,105.0,98.0,SSTR
...

2. Training

Train GCN Model (Full Pipeline)

python scripts/train.py --config configs/config.yaml --model gcn

Train Baseline Models (for Comparison)

# LSTM (Time-series sequential)
python scripts/train.py --model lstm

# RNN (Basic recurrent)
python scripts/train.py --model rnn

# TCN (Temporal Convolutional Network)
python scripts/train.py --model tcn

# ANN (Standard feed-forward)
python scripts/train.py --model ann

3. Evaluation

python scripts/evaluate.py --checkpoint checkpoints/best_gcn.pth --test_data data/test.csv

Expected output:

Test Metrics:
- Accuracy: 0.90
- Precision: 0.93
- Recall: 0.94
- F1-Score: 0.90
- AUC: 0.95

Confusion Matrix:
       SSTR  THRO  WSTR  SILIS  MICR
SSTR   0.97  0.01  0.02   0.00  0.00
THRO   0.00  0.99  0.00   0.00  0.00
WSTR   0.03  0.01  0.95   0.00  0.01
SILIS  0.00  0.00  0.00   0.99  0.01
MICR   0.00  0.04  0.00   0.00  0.95

4. Inference on New Wells

import torch
from models.gcn import MicrobialiteGCN
from utils.data_loader import WellLogDataset

# Load model
model = MicrobialiteGCN(input_dim=8, num_classes=5)
checkpoint = torch.load('checkpoints/best_gcn.pth')
model.load_state_dict(checkpoint['model_state_dict'])

# Prepare data (8 log curves)
logs = [[55.2, 8.5, 2.1, 2.45, 15.0, 3.2, 100.0, 95.0], ...]  # [seq_len, 8]
input_tensor = torch.FloatTensor(logs).unsqueeze(0)  # [1, seq_len, 8]

# Predict
logits, adjacency_matrix = model(input_tensor)
predicted_class = torch.argmax(logits, dim=1)
# Returns: SSTR (Stratiform Stromatolite)

πŸ”„ Transfer Learning (Fine-tuning)

The model supports rapid adaptation to new formations with limited data, as demonstrated in the paper for:

  • Dengying-2 Member (Taihe Gas Field) - 6 lithologies
  • Leikoupo-4Β³ Submember (Pengzhou Gas Field) - 3 lithologies

Strategy 1: Standard Fine-tuning (Medium Dataset: >1000 samples)

python scripts/transfer_learning.py     --source_model checkpoints/best_gcn.pth     --target_data data/dengying2.csv     --num_classes 6     --strategy fine_tune     --epochs 50

Strategy 2: GCN + SVM (Small Dataset: <500 samples)

When target data is extremely limited, freeze GCN layers and use SVM classifier:

from scripts.transfer_learning import TransferLearningGCN

# Initialize transfer learning
tl = TransferLearningGCN(
    pretrained_path='checkpoints/best_gcn.pth',
    num_new_classes=3,  # Leikoupo-43 has 3 classes
    freeze_layers=True
)

# Use GCN as feature extractor + SVM
tl.fine_tune_with_svm(X_train, y_train, X_test, y_test)

Results with Fine-tuning:

Formation Samples Strategy Accuracy Notes
Dengying-4 (Source) 8,000 Baseline 0.90 Original training
Dengying-2 500 GCN+SVM 0.86 Limited data
Dengying-2 8,000 Fine-tune 0.91 Full adaptation
Leikoupo-4Β³ 2,000 Fine-tune 0.84 Cross-formation

πŸ“Š Experimental Results

Main Results (Test Set, Moxi Gas Field)

Model Architecture Acc Pre Rec F1 AUC Params
GCN GRU+GFT+GLU+GCN 0.90 0.93 0.94 0.90 0.95 2.1M
LSTM 5-layer LSTM 0.80 0.79 0.78 0.80 0.78 1.8M
RNN 5-layer RNN 0.61 0.60 0.65 0.61 0.72 1.2M
TCN Dilated Conv 0.70 0.69 0.72 0.70 0.78 1.5M
FC-ANN Fully Connected 0.61 0.50 0.56 0.61 0.58 2.8M
Dropout-ANN ANN + Dropout 0.67 0.70 0.63 0.67 0.62 2.8M

Key Findings

  1. GCN Superiority: GCN achieves 90% accuracy, outperforming LSTM by 10% and RNN by 29%
  2. Overfitting in ANNs: Standard ANNs show severe overfitting (train acc 0.82 β†’ test acc 0.61), mitigated by dropout but still inferior to sequential models
  3. Temporal Information: Shuffling time-series destroys sedimentary patterns, reducing all models to ~20% accuracy
  4. Class-wise Performance:
    • THRO (Thrombolite): Best identified (Accuracy >0.95)
    • SSTR vs WSTR: Main confusion pair (3% misclassification due to similar lamination patterns)
    • SILIS: Easily distinguished by quartz signature in PE logs

Ablation Study

Components removed and performance impact:

Configuration Accuracy Drop Analysis
Full Model 0.90 Baseline
w/o GRU -0.18 Graph construction is critical
w/o Self-Attention -0.06 Attention provides moderate gain
w/o DFT -0.08 Frequency domain important
w/o GFT -0.10 Graph Fourier Transform essential
w/o Convolution -0.28 Most vital component
Single GCN Layer -0.08 Two layers optimal

πŸ” Geological Insights

Stratigraphic Sequence Analysis

The model captures Walther's Law in the vertical direction:

  • Window=2 (0.25m): Dominated by same-lithology transitions (self-transitions)
  • Window=3 (0.375m): Optimal for detecting lithology changes
  • Window>4: Self-transitions dominate again

This 0.375m scale matches the GCN's receptive field, validating that the model learns actual depositional cyclicity.

Log Correlation Analysis

Despite high correlation between RLLD and RLLS (Pearson 0.81), removing either reduces accuracy by ~5%, indicating they provide complementary latent information through spectral graph convolution.


πŸ“‚ Project Structure

gcn-lithology-identification/
β”œβ”€β”€ configs/
β”‚   └── config.yaml              # Configuration parameters
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ gcn.py                   # Main GCN architecture (GRU + Spectral GCN)
β”‚   β”œβ”€β”€ layers.py                # Custom layers (GLU, GFT, GraphConv)
β”‚   └── baselines.py             # LSTM, RNN, TCN, ANN comparators
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ data_loader.py           # Data loading with SMOTE augmentation
β”‚   β”œβ”€β”€ graph_utils.py           # Adjacency matrix construction
β”‚   β”œβ”€β”€ metrics.py               # Evaluation metrics (Acc, F1, AUC)
β”‚   └── visualizer.py            # Confusion matrix & log visualization
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ train.py                 # Main training loop
β”‚   β”œβ”€β”€ evaluate.py              # Model evaluation
β”‚   β”œβ”€β”€ predict.py               # Inference script
β”‚   └── transfer_learning.py     # Fine-tuning for new formations
β”œβ”€β”€ notebooks/
β”‚   └── tutorial.ipynb           # Step-by-step tutorial
β”œβ”€β”€ requirements.txt
└── README.md

πŸ“š Citation

If you use this code or dataset in your research, please cite:

@article{li2025spectral,
  title={Spectral graph convolution networks for microbialite lithology identification based on conventional well logs},
  author={Li, Ke-Ran and Song, Jin-Min and Wang, Han and Yan, Hai-Jun and Liu, Shu-Gen and Lan, Yang and Jin, Xin and Ren, Jia-Xin and Zhao, Ling-Li and Tian, Li-Zhou and Deng, Hao-Shuang and Chen, Wei},
  journal={Petroleum Science},
  volume={22},
  pages={1513--1533},
  year={2025},
  publisher={Elsevier},
  doi={10.1016/j.petsci.2025.02.008}
}

⚠️ Usage Notes

  1. Data Quality: Ensure logs are environmentally corrected and depth-aligned. Missing values should be interpolated before training.

  2. Class Imbalance: Microbialite distributions are naturally imbalanced (SSTR: 33%, THRO: 27%, WSTR: 12%, etc.). Always use SMOTE or class-weighted loss to avoid bias toward majority classes.

  3. Sequence Preservation: Do not shuffle the training data along the depth axis. The model relies on temporal dependencies in sedimentary sequences. Shuffling reduces accuracy to ~20%.

  4. Transfer Learning: When applying to new formations:

    • If >1000 samples available: Use standard fine-tuning
    • If 500-1000 samples: Use GCN+SVM strategy
    • If <500 samples: Consider domain adaptation or data augmentation
  5. Hyperparameters: The paper uses 1200 epochs with early stopping (patience=50). Learning rate 5e-4 works best for Adam optimizer.


πŸ“ License

This project is licensed under MIT License.

About

Official code snippets of paper "Spectral graph convolution networks for microbialite lithology identification based on conventional well logs"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%