A high-performance, adaptive sequence prediction engine based on Context Mixing and Variable Order Markov Models (VOMM). Implemented in pure Python with zero external dependencies.
TreeMemoryPredictor learns patterns "on the fly." Unlike deep learning models, it requires no training epochs, has zero cold-start latency, and adapts instantly to new data streams while smoothly forgetting outdated information.
-
Instant Learning: Calling
model.update(token)updates probabilities in$O(N)$ time. The next prediction is immediately smarter. - Privacy-First: Runs entirely locally. Perfect for on-device personalization (e.g., a custom keyboard predicting a user's unique slang).
- Explainable Math: No black box. Every probability is mathematically derived from exact historical occurrences and explicit decay formulas.
-
Reverse Suffix Trie (
$O(N)$ Traversal): Context is processed backwards (from the most recent token to the oldest). This evolutionary architecture drops sequence lookup complexity from$O(N^2)$ down to strictly$O(N)$ . -
Lazy Exponential Decay: Implements
$O(1)$ weight updates relative to history length. Old patterns smoothly fade away ($Count \times Decay^{\Delta t}$ ), allowing the model to adapt to changing trends. - True-Decay Garbage Collection: An automated, memory-bounding pruning system that applies exact time-decay formulas before wiping dead branches. Prevents memory leaks over infinite data streams.
-
Context Masking (
masked_mode): Optional wildcard evaluation via BFS. Allows the model to find probabilistic correlations even if random "noise" or typos are inserted between significant patterns. -
Micro-Optimized pure Python: Uses
__slots__, lazy-populated math caches, and the Log-Sum-Exp trick to prevent numerical underflow, running as fast as mathematically possible in Python.
Simply copy the tmp.py class code into your project.
from tmp import TreeMemoryPredictor
# Initialize: Max context of 5, Forgets at a rate of 0.9 per step
model = TreeMemoryPredictor(n_max=5, decay=0.9)
sequence =['click', 'buy', 'click', 'buy', 'click']
# Online Learning loop
for token in sequence:
# 1. Predict next token (Greedy / Argmax)
prediction = model.predict()
print(f"Predicted next step: {prediction}")
# 2. Ingest the actual token & instantly learn it
model.update(token) You can control the randomness of the prediction using standard LLM parameters.
# Temperature > 1.0 makes it creative, < 1.0 makes it strict and confident
next_token = model.predict(temperature=1.2, top_k=5)
# Nucleus Sampling (Dynamically limits candidates to top 90% of probability mass)
next_token = model.predict(top_p=0.9)Real-world data can contain noise or typos (e.g., ["open", "red", "door"] instead of ["open", "wooden", "door"]). TMP can fallback to wildcard matching to bridge the gap:
none: Strict exact sequence match (Default & Fastest).linear: Ignores recent noisy tokens (treats them as wildcards), but strictly matches older history.squared: Full combinatorial search. Any token in the context can independently be matched or ignored.
# Enable wildcard fallback to find correlations across noisy sequences
model.predict(masked_mode='linear')Handle independent sequences (e.g., separate user sessions or completely different documents) by clearing the context between them.
dataset = [
['user', 'login', 'view_item', 'logout'],
['admin', 'login', 'delete_user', 'logout']
]
# model.fit() automatically flushes context between sequences
model.fit(dataset)
# Inference: Manually set context to simulate an active session
model.fill_context(['admin', 'login'])
probs = model.predict_proba()
# -> {'delete_user': 0.9, 'logout': 0.1}TMP handles dynamic internal caches safely during serialization.
# Save the model state to disk
model.save("my_predictor.pkl")
# Load it back later
restored_model = TreeMemoryPredictor.load("my_predictor.pkl")The algorithm uses a weighted voting system across multiple Markov orders (Prediction by Partial Matching).
Each valid node in the Trie calculates its "voting power" using this exact formula:
Where:
-
$N_{obs}$ : Frequency count of the token. -
$Decay$ : Forgetting factor (e.g., 0.99). -
$\Delta t$ : Time steps since this node was last updated. -
$L_{eff}$ : Effective matched context length (Ignores wildcard*positions). -
$S$ : Current Vocabulary Size (Dynamically scales the reward for matching long patterns based on alphabet complexity).
| Parameter | Type | Default | Description |
|---|---|---|---|
n_max |
int |
10 |
Maximum sequence length to remember. 3-5 for words, 10-20 for characters. |
n_min |
int |
1 |
Minimum effective context length required to accept a match. |
decay |
float |
0.99 |
Memory retention rate. 0.9 adapts fast to new trends; 0.999 keeps long-term memory. |
alphabet_autoscale |
bool |
True |
Balances pattern-matching entropy. Keep this True in almost all cases. |
pruning_step |
int |
1000 |
How often (in steps) to execute the Garbage Collector. |
cache_size |
int |
4096 |
Size of internal lazy math caches. Increase if running on massive datasets with slow decay. |
| Parameter | Type | Default | Description |
|---|---|---|---|
temperature |
float |
1.0 |
Flattens (>1.0) or sharpens (<1.0) the distribution. |
top_k |
int |
0 |
Hard cutoff. Keeps only the 0 disables it. |
top_p |
float |
1.0 |
Nucleus sampling cutoff. Drops the long tail of low-probability tokens. |
masked_mode |
str |
'none' |
Search strategy: 'none', 'linear', or 'squared'. |
Contributions, issue reports, and pull requests are warmly welcomed! If you have ideas for adding new traversal mechanics, pruning optimizations, or Kneser-Ney smoothing integrations, feel free to open a PR.
This project is licensed under the MIT License.