RefNet (Reflective Evaluation Network) is a transformer-based neural architecture designed to model and predict introspective metrics in Self-Reflective AI (SRAI) systems. The model processes sequences of thought tokens and outputs predictions for valence, semantic metric distance (SMD), quality, and next cognitive action.
- Input Format: Sequences of 256-dimensional embeddings representing thought tokens
- Sequence Length: Configurable maximum length (default: 64 tokens)
- Padding: Zero-padding for variable-length sequences
- Type: Sinusoidal positional encoding
- Purpose: Provides sequence position information to the transformer
- Implementation: Standard transformer sinusoidal positional encoding (fixed, non-learnable)
- Architecture: Multi-layer transformer encoder
- Default Configuration:
- Model dimension: 256
- Attention heads: 8
- Layers: 6
- Feed-forward dimension: 1024 (4 × d_model)
- Dropout: 0.1
- Valence Head: Linear layer → 1 output (regression)
- SMD Head: Linear layer → 1 output (regression)
- Quality Head: Linear layer → 1 output (binary classification)
- Action Head: Linear layer → 4 outputs (multi-class classification)
Input Embeddings (B, L, 256)
↓
Linear Projection (256 → 256)
↓
Positional Encoding Addition
↓
Transformer Encoder (6 layers)
↓
Global Average Pooling (L → 1)
↓
Multi-Task Heads
↓
Outputs: {valence, smd, quality, next_action}
The model uses a weighted combination of multiple loss functions:
L_total = λ_valence × MSE(valence_pred, valence_true) +
λ_smd × MSE(smd_pred, smd_true) +
λ_quality × BCE(quality_pred, quality_true) +
λ_action × CE(action_pred, action_true)
- λ_valence = 1.0
- λ_smd = 1.0
- λ_quality = 0.5
- λ_action = 0.5
- Type: AdamW
- Learning Rate: 0.0003
- Weight Decay: 0.01
- Epochs: 20
- Batch Size: 32
- Max Sequence Length: 64
- Shuffle: Training data shuffled, validation data not shuffled
{
"tokens": [
{
"token_name": "reflect",
"embed": [256-dimensional vector],
"metrics": {"valence": float, "smd": float},
"edges": []
}
]
}{
"valence": float, // Emotional tone prediction
"smd": float, // Semantic metric distance prediction
"quality": float, // Thought quality prediction (0-1)
"next_action": tensor // Action logits [consolidate, recall, reframe, evaluate_alignment]
}- Valence: Predicts emotional tone of thought sequences
- SMD: Estimates semantic metric distance between thoughts
- Quality: Assesses thought quality (binary classification)
- Next Action: Predicts the most likely next cognitive action
consolidate: Integrate informationrecall: Retrieve stored informationreframe: Restructure understandingevaluate_alignment: Check consistency
- Sequence Modeling: Naturally handles variable-length thought sequences
- Attention Mechanism: Captures long-range dependencies in thought processes
- Parallelization: Efficient training on modern hardware
- Shared Representations: Common features benefit all prediction tasks
- Regularization: Multiple objectives prevent overfitting
- Efficiency: Single model for multiple predictions
- Sequence Length Independence: Works with variable-length inputs
- Computational Efficiency: Reduces computational complexity
- Interpretability: Provides single representation per sequence
- Memory: ~2GB GPU memory for default configuration
- Training Time: ~5-10 minutes on modern GPU for 20 epochs
- Inference: Real-time prediction capability
- Model Size: ~1M parameters
- Sequence Length: Limited by memory, tested up to 512 tokens
- Batch Size: Scales linearly with available memory
- Attention Visualization: Add attention weight analysis
- Hierarchical Modeling: Multi-scale sequence processing
- Dynamic Weights: Adaptive loss weight adjustment
- Uncertainty Quantification: Prediction confidence estimation
- Few-Shot Learning: Adaptation to new domains
- Interpretability: Understanding model decision-making
- Causal Modeling: Capturing causal relationships in thoughts
- Multi-Modal: Incorporating additional modalities
- Real-Time: Optimizing for streaming applications