Keran Lia,1, Jinmin Songa,*, Han Wanga, Haijun Yanb, Shugen Liua, Yang Lanc,2, Xin Jina, Jiaxin Rena, Lizhou Tiana, Haoshuang Denga, Wei Chena
aState Key Laboratory of Oil and Gas Reservoir Geology and Exploitation, Chengdu University of Technology, Chengdu 610059, China
bResearch Institute of Petroleum Exploration and Development, Beijing, 100083, China
cUniversity College London, Gower Street, London, WC1E 6BT, UK
1Present address: State Key Laboratory of Critical Earth Material Cycling and Mineral Deposits, Frontiers Science Center for Critical Earth Material Cycling, School of Earth Sciences and Engineering, Nanjing University, Nanjing, 210023, China
2Present address: School of Economics and Management, Beihang University, Beijing, 100191, China
*Corresponding authors
π Paper β’ π Project Page β’ π Dataset β’ π Quick Start
This repository provides the official implementation of Spectral Graph Convolutional Networks (GCN) for automated microbialite lithology identification from conventional well logs. Unlike traditional methods that shuffle time-series data (destroying sedimentary sequence information), this approach treats well logs as graph-structured spectral data, preserving both vertical temporal dependencies and inter-log correlations.
- πΈοΈ Graph Representation: Transforms well logs into latent graphs (spectra + adjacency matrix) using GRU and self-attention
- π΅ Spectral Processing: Utilizes Graph Fourier Transform (GFT) and Discrete Fourier Transform (DFT) to capture frequency-domain features
- β±οΈ Sequence Preservation: Maintains depth-series (time-series) order without shuffling, modeling actual sedimentary deposition sequences
- βοΈ Data Balance: Implements SMOTE to handle class imbalance in microbialite distribution
- π Transfer Learning: Demonstrates fine-tuning strategies for adapting to new formations (Dengying-2, Leikoupo-4Β³) with limited samples
| Model | Accuracy | Precision | Recall | F1-Score | AUC |
|---|---|---|---|---|---|
| GCN (Ours) | 0.90 | 0.93 | 0.94 | 0.90 | 0.95 |
| LSTM | 0.80 | 0.79 | 0.78 | 0.80 | 0.78 |
| RNN | 0.61 | 0.60 | 0.65 | 0.61 | 0.72 |
| TCN | 0.70 | 0.69 | 0.72 | 0.70 | 0.78 |
| ANN | 0.61 | 0.50 | 0.56 | 0.61 | 0.58 |
Results on Dengying Formation (Z2dn4), Moxi Gas Field, Sichuan Basin
Workflow: Raw Logs β GRU Encoder β Self-Attention (Adjacency) β GFT/DFT β GLU β Graph Conv β Classification
- π GRU Block: Processes depth-series sequences to generate latent graph representations
- π Self-Attention: Dynamically constructs adjacency matrices from hidden states (Q, K, V mechanism)
- π Spectral Transform:
- GFT: Graph Fourier Transform using Laplacian eigen-decomposition
- DFT: Discrete Fourier Transform for frequency-domain convolution
- ποΈ GLU (Gated Linear Unit): Controls information flow in spectral domain
- πΈοΈ Graph Convolution: Spectral graph convolution with symmetric normalized Laplacian
- Location: Moxi Gas Field, Central Sichuan Basin, China
- Formation: 4th Member of Ediacaran Dengying Formation (Z2dn4)
- Wells: 44 wells (42 for training, 2 preserved for generalization testing)
- Samples: 10,367 valid data points (after SMOTE augmentation: 12,570)
| Curve | Description | Unit | Geological Significance |
|---|---|---|---|
| AC | Acoustic (Sonic) | ΞΌs/m | Velocity, porosity indicator |
| CAL | Caliper | inch | Borehole diameter, caving |
| CNL | Compensated Neutron | % | Porosity, hydrogen index |
| DEN | Density | g/cmΒ³ | Bulk density, lithology |
| GR | Gamma Ray | API | Shale content, clay volume |
| PE | Photoelectric Factor | b/e | Mineral composition, lithology |
| RLLD | Deep Resistivity | Ω·m | True formation resistivity |
| RLLS | Shallow Resistivity | Ω·m | Invasion zone resistivity |
| Code | Full Name | Description | Characteristics |
|---|---|---|---|
| MICR | Dolomicrite | Micritic dolomite | Dark, fine-grained, rare structures |
| SSTR | Stratiform Stromatolite | Layered microbial mats | Parallel laminations, intermittent dark lines |
| WSTR | Wavy Stromatolite | Undulating microbial structures | Large curvature, semi-circular, porous |
| THRO | Thrombolite | Clotted microbial structure | Dark clots, diffusing fabric, dissolving pores |
| SILIS | Siliceous Stromatolite | Silica-rich microbialite | Curved stripes, interlayer quartz, brittle |
- Python β₯ 3.8
- CUDA β₯ 11.3 (optional, for GPU acceleration)
- 8GB+ RAM
# Clone repository
git clone https://github.com/KeranLi/GCN-Microbialite-Lithology.git
cd GCN-Microbialite-Lithology
# Install dependencies
pip install -r requirements.txt
# Install package
pip install -e .Key Dependencies:
torch>=1.11.0
numpy>=1.20.0
pandas>=1.3.0
scikit-learn>=0.24.0
scipy>=1.7.0
imbalanced-learn>=0.8.0 # For SMOTE
Prepare CSV files with the following columns:
Depth,AC,CAL,CNL,DEN,GR,PE,RLLD,RLLS,Lithology
5000.0,55.2,8.5,2.1,2.45,15.0,3.2,100.0,95.0,SSTR
5000.125,56.1,8.6,2.2,2.46,16.2,3.1,105.0,98.0,SSTR
...python scripts/train.py --config configs/config.yaml --model gcn# LSTM (Time-series sequential)
python scripts/train.py --model lstm
# RNN (Basic recurrent)
python scripts/train.py --model rnn
# TCN (Temporal Convolutional Network)
python scripts/train.py --model tcn
# ANN (Standard feed-forward)
python scripts/train.py --model annpython scripts/evaluate.py --checkpoint checkpoints/best_gcn.pth --test_data data/test.csvExpected output:
Test Metrics:
- Accuracy: 0.90
- Precision: 0.93
- Recall: 0.94
- F1-Score: 0.90
- AUC: 0.95
Confusion Matrix:
SSTR THRO WSTR SILIS MICR
SSTR 0.97 0.01 0.02 0.00 0.00
THRO 0.00 0.99 0.00 0.00 0.00
WSTR 0.03 0.01 0.95 0.00 0.01
SILIS 0.00 0.00 0.00 0.99 0.01
MICR 0.00 0.04 0.00 0.00 0.95
import torch
from models.gcn import MicrobialiteGCN
from utils.data_loader import WellLogDataset
# Load model
model = MicrobialiteGCN(input_dim=8, num_classes=5)
checkpoint = torch.load('checkpoints/best_gcn.pth')
model.load_state_dict(checkpoint['model_state_dict'])
# Prepare data (8 log curves)
logs = [[55.2, 8.5, 2.1, 2.45, 15.0, 3.2, 100.0, 95.0], ...] # [seq_len, 8]
input_tensor = torch.FloatTensor(logs).unsqueeze(0) # [1, seq_len, 8]
# Predict
logits, adjacency_matrix = model(input_tensor)
predicted_class = torch.argmax(logits, dim=1)
# Returns: SSTR (Stratiform Stromatolite)The model supports rapid adaptation to new formations with limited data, as demonstrated in the paper for:
- Dengying-2 Member (Taihe Gas Field) - 6 lithologies
- Leikoupo-4Β³ Submember (Pengzhou Gas Field) - 3 lithologies
python scripts/transfer_learning.py --source_model checkpoints/best_gcn.pth --target_data data/dengying2.csv --num_classes 6 --strategy fine_tune --epochs 50When target data is extremely limited, freeze GCN layers and use SVM classifier:
from scripts.transfer_learning import TransferLearningGCN
# Initialize transfer learning
tl = TransferLearningGCN(
pretrained_path='checkpoints/best_gcn.pth',
num_new_classes=3, # Leikoupo-43 has 3 classes
freeze_layers=True
)
# Use GCN as feature extractor + SVM
tl.fine_tune_with_svm(X_train, y_train, X_test, y_test)Results with Fine-tuning:
| Formation | Samples | Strategy | Accuracy | Notes |
|---|---|---|---|---|
| Dengying-4 (Source) | 8,000 | Baseline | 0.90 | Original training |
| Dengying-2 | 500 | GCN+SVM | 0.86 | Limited data |
| Dengying-2 | 8,000 | Fine-tune | 0.91 | Full adaptation |
| Leikoupo-4Β³ | 2,000 | Fine-tune | 0.84 | Cross-formation |
| Model | Architecture | Acc | Pre | Rec | F1 | AUC | Params |
|---|---|---|---|---|---|---|---|
| GCN | GRU+GFT+GLU+GCN | 0.90 | 0.93 | 0.94 | 0.90 | 0.95 | 2.1M |
| LSTM | 5-layer LSTM | 0.80 | 0.79 | 0.78 | 0.80 | 0.78 | 1.8M |
| RNN | 5-layer RNN | 0.61 | 0.60 | 0.65 | 0.61 | 0.72 | 1.2M |
| TCN | Dilated Conv | 0.70 | 0.69 | 0.72 | 0.70 | 0.78 | 1.5M |
| FC-ANN | Fully Connected | 0.61 | 0.50 | 0.56 | 0.61 | 0.58 | 2.8M |
| Dropout-ANN | ANN + Dropout | 0.67 | 0.70 | 0.63 | 0.67 | 0.62 | 2.8M |
- GCN Superiority: GCN achieves 90% accuracy, outperforming LSTM by 10% and RNN by 29%
- Overfitting in ANNs: Standard ANNs show severe overfitting (train acc 0.82 β test acc 0.61), mitigated by dropout but still inferior to sequential models
- Temporal Information: Shuffling time-series destroys sedimentary patterns, reducing all models to ~20% accuracy
- Class-wise Performance:
- THRO (Thrombolite): Best identified (Accuracy >0.95)
- SSTR vs WSTR: Main confusion pair (3% misclassification due to similar lamination patterns)
- SILIS: Easily distinguished by quartz signature in PE logs
Components removed and performance impact:
| Configuration | Accuracy Drop | Analysis |
|---|---|---|
| Full Model | 0.90 | Baseline |
| w/o GRU | -0.18 | Graph construction is critical |
| w/o Self-Attention | -0.06 | Attention provides moderate gain |
| w/o DFT | -0.08 | Frequency domain important |
| w/o GFT | -0.10 | Graph Fourier Transform essential |
| w/o Convolution | -0.28 | Most vital component |
| Single GCN Layer | -0.08 | Two layers optimal |
The model captures Walther's Law in the vertical direction:
- Window=2 (0.25m): Dominated by same-lithology transitions (self-transitions)
- Window=3 (0.375m): Optimal for detecting lithology changes
- Window>4: Self-transitions dominate again
This 0.375m scale matches the GCN's receptive field, validating that the model learns actual depositional cyclicity.
Despite high correlation between RLLD and RLLS (Pearson 0.81), removing either reduces accuracy by ~5%, indicating they provide complementary latent information through spectral graph convolution.
gcn-lithology-identification/
βββ configs/
β βββ config.yaml # Configuration parameters
βββ models/
β βββ gcn.py # Main GCN architecture (GRU + Spectral GCN)
β βββ layers.py # Custom layers (GLU, GFT, GraphConv)
β βββ baselines.py # LSTM, RNN, TCN, ANN comparators
βββ utils/
β βββ data_loader.py # Data loading with SMOTE augmentation
β βββ graph_utils.py # Adjacency matrix construction
β βββ metrics.py # Evaluation metrics (Acc, F1, AUC)
β βββ visualizer.py # Confusion matrix & log visualization
βββ scripts/
β βββ train.py # Main training loop
β βββ evaluate.py # Model evaluation
β βββ predict.py # Inference script
β βββ transfer_learning.py # Fine-tuning for new formations
βββ notebooks/
β βββ tutorial.ipynb # Step-by-step tutorial
βββ requirements.txt
βββ README.md
If you use this code or dataset in your research, please cite:
@article{li2025spectral,
title={Spectral graph convolution networks for microbialite lithology identification based on conventional well logs},
author={Li, Ke-Ran and Song, Jin-Min and Wang, Han and Yan, Hai-Jun and Liu, Shu-Gen and Lan, Yang and Jin, Xin and Ren, Jia-Xin and Zhao, Ling-Li and Tian, Li-Zhou and Deng, Hao-Shuang and Chen, Wei},
journal={Petroleum Science},
volume={22},
pages={1513--1533},
year={2025},
publisher={Elsevier},
doi={10.1016/j.petsci.2025.02.008}
}-
Data Quality: Ensure logs are environmentally corrected and depth-aligned. Missing values should be interpolated before training.
-
Class Imbalance: Microbialite distributions are naturally imbalanced (SSTR: 33%, THRO: 27%, WSTR: 12%, etc.). Always use SMOTE or class-weighted loss to avoid bias toward majority classes.
-
Sequence Preservation: Do not shuffle the training data along the depth axis. The model relies on temporal dependencies in sedimentary sequences. Shuffling reduces accuracy to ~20%.
-
Transfer Learning: When applying to new formations:
- If >1000 samples available: Use standard fine-tuning
- If 500-1000 samples: Use GCN+SVM strategy
- If <500 samples: Consider domain adaptation or data augmentation
-
Hyperparameters: The paper uses 1200 epochs with early stopping (patience=50). Learning rate 5e-4 works best for Adam optimizer.
This project is licensed under MIT License.