Skip to content

Latest commit

Β 

History

History
327 lines (249 loc) Β· 9.07 KB

File metadata and controls

327 lines (249 loc) Β· 9.07 KB

PYTINK - Stock Price Prediction with Transformers

Welcome! πŸ‘‹

This is a transformer model to predict stock price movements using sequences of changes in a list of stocks. The alphabet encodes changes. The default model uses configuration in config_template.yaml.


πŸ“‹ START HERE

Choose your learning path:

New to the project?

  1. Read PROJECT_SUMMARY.md (5 min overview)
  2. Follow QUICKSTART.md (setup & first run)
  3. Try EXAMPLES.md (11 practical examples)

Want to dive in immediately?

pip install -r requirements.txt
python test_installation.py          # Verify setup
python train_model.py --db-password YOUR_PASSWORD   # Run with defaults

Looking for specific information?


🎯 What This Project Does

Trains a transformer model to predict the next "word" in a sequence of stock price changes:

  • Input: List of stocks (e.g., ["AAPL", "MSFT", "GOOGL"])
  • Process: Encode price changes as letters (a-g for 7 delta levels)
  • Example Word: "acgaeb" = 6 stocks with specific price movements
  • Model Task: Given 4-16 words, predict the next word
  • Output: Predictions of next period's price movements

πŸ“ Project Structure

pytink/
β”œβ”€β”€ README.md                 ← Full technical documentation
β”œβ”€β”€ QUICKSTART.md             ← Setup & usage guide
β”œβ”€β”€ PROJECT_SUMMARY.md        ← Overview & architecture
β”œβ”€β”€ EXAMPLES.md               ← 11 usage examples
β”œβ”€β”€ STRUCTURE.txt             ← File descriptions
β”œβ”€β”€ INDEX.md                  ← You are here
β”‚
β”œβ”€β”€ requirements.txt          ← Install with: pip install -r requirements.txt
β”œβ”€β”€ config_template.py        ← Configuration templates
β”œβ”€β”€ test_installation.py      ← Verify installation
β”œβ”€β”€ train_model.py            ← Training CLI interface
β”œβ”€β”€ inference.py              ← Evaluate trained models
β”‚
β”œβ”€β”€ src/                      ← Core Python modules
β”‚   β”œβ”€β”€ database.py           ← MySQL interface
β”‚   β”œβ”€β”€ processor.py          ← Data processing & encoding
β”‚   β”œβ”€β”€ model.py              ← PyTorch models
β”‚   β”œβ”€β”€ analysis.py           ← Visualization tools
β”‚   └── __init__.py           ← Package init
β”‚
└── tests/                    ← Unit & integration tests
    β”œβ”€β”€ test_database.py
    β”œβ”€β”€ test_processor.py
    β”œβ”€β”€ test_model.py
    β”œβ”€β”€ test_integration.py
    └── test_inference.py

πŸš€ Quick Start (2 minutes)

Command Line (Recommended)

cd ~/pytink
pip install -r requirements.txt
python train_model.py --db-password YOUR_PASSWORD

Evaluate a Trained Model

python inference.py --db-password YOUR_PASSWORD --model-dir models/TICKER-LIST/TIMESTAMP/

Verify Installation First

cd ~/pytink
python test_installation.py

Run Tests

pytest tests/ -v

πŸ“š Documentation Files

File Purpose Read Time
PROJECT_SUMMARY.md Overview, architecture, features 5 min
QUICKSTART.md Setup, basic usage, troubleshooting 10 min
README.md Complete technical reference 20 min
EXAMPLES.md 11 practical code examples 15 min
STRUCTURE.txt File descriptions & statistics 5 min

πŸ”§ Core Modules

database.py - MySQL Interface

from database import StockDatabase
db = StockDatabase()
db.connect()
stocks = db.get_random_stocks(count=10)
quotes = db.get_quotes_for_stocks(stock_ids)

processor.py - Data Processing

from processor import PriceProcessor
processor = PriceProcessor(interval_minutes=15)
words = processor.extract_words(quotes, stock_ids)
unique_count, unique_words = processor.count_unique_words(words)

model.py - PyTorch Models

from model import StockWordDataset, StockTransformerModel
dataset = StockWordDataset(words, vocab, context_window_size=4)
model = StockTransformerModel(vocab_size=len(vocab))
predictions = model.predict(input_ids)

analysis.py - Visualization

from analysis import plot_training_loss, analyze_prediction_quality
plot_training_loss(history)
analyze_prediction_quality(predictions)

πŸ“Š Delta Encoding (Key Concept)

Price changes are encoded as letters a-g:

Letter Delta Change
a -0.01 ↓ 1% or more
b -0.005 ↓ 0.5%
c -0.001 ↓ 0.1%
d 0.00 β†’ 0%
e +0.001 ↑ 0.1%
f +0.005 ↑ 0.5%
g +0.01 ↑ 1% or more

Example: "acgaeb" with 6 stocks = AAPL↓1%, AAL↓0.1%, MSFT↑1%, GOOGL↓1%, AMZN↑0.1%, TSLA↓0.5%


πŸ“ˆ Workflow

MySQL Database
    ↓ (fetch historical prices)
Raw Quote Data (timestamps + prices)
    ↓ (align timestamps, calculate deltas)
Encoded "Words" (sequences like "acgaeb")
    ↓ (create training pairs)
PyTorch Dataset
    ↓ (train transformer)
Trained Model
    ↓ (evaluate performance)
Loss / Accuracy / Perplexity metrics

πŸ’» System Requirements

  • Python: 3.8+
  • Database: MySQL 5.7+ (local, port 3306)
  • RAM: 4GB minimum (8GB+ recommended)
  • GPU: Optional (training faster with CUDA)
  • Disk: ~500MB for dependencies

πŸŽ“ Learning Path

Beginner

  1. Read PROJECT_SUMMARY.md
  2. Run python test_installation.py
  3. Run python train_model.py
  4. Review EXAMPLES.md (Example 1-3)

Intermediate

  1. Review README.md
  2. Run pytest tests/
  3. Modify parameters in QUICKSTART.md
  4. Try EXAMPLES.md (Example 4-7)

Advanced

  1. Study source code in src/
  2. Implement custom modifications
  3. Review EXAMPLES.md (Example 8-11)
  4. Experiment with config_template.py

πŸ€” FAQ

Q: What's a "word" in this context? A: A sequence of letters representing price changes for all stocks in one time period. Example: "acgaeb" is a 6-stock word.

Q: How many "words" will there be? A: Depends on data. Usually 100-10,000 unique words. With 10 stocks, max theoretical is 9^10 (3.5B), but real data has far fewer.

Q: Can I use different stocks? A: Yes! Use db.get_random_stocks(count=X) or query specific tickers in database.py.

Q: How long does training take? A: ~1-5 minutes per epoch (10 epochs default), depending on data size and hardware.

Q: How do I save the trained model? A: See EXAMPLES.md (Example 10) for model saving/loading code.

Q: Can I use this for real trading? A: This is a research/educational project. Don't use for real trading without extensive testing.

Q: What if I don't have MySQL? A: The database module handles connections. You'll need MySQL running on localhost:3306 with the "tinker" database.


πŸ”— Quick Links


πŸ§ͺ Testing

Verify installation before starting:

python test_installation.py

This checks:

  • βœ“ Python packages
  • βœ“ Project modules
  • βœ“ Database connection
  • βœ“ PyTorch setup
  • βœ“ Model creation
  • βœ“ Data processing

πŸ“ Files at a Glance

File Type Purpose
train_model.py Script CLI interface
test_installation.py Script Installation verification
config_template.py Config Configuration templates
requirements.txt Config Dependencies
src/database.py Code Database access
src/processor.py Code Data processing
src/model.py Code PyTorch models
src/analysis.py Code Visualization
tests/ Tests Unit & integration tests (pytest)
README.md Docs Full technical reference
QUICKSTART.md Docs Setup & usage guide
EXAMPLES.md Docs 11 code examples
PROJECT_SUMMARY.md Docs Project overview
STRUCTURE.txt Docs File descriptions

🎯 Next Steps

  1. Now: Read PROJECT_SUMMARY.md (5 min)
  2. Then: Follow QUICKSTART.md (10 min)
  3. Test: Run python test_installation.py (1 min)
  4. Try: Run python train_model.py (5-10 min)
  5. Explore: Check EXAMPLES.md for more use cases

πŸ“ž Support


πŸ“„ License

Educational project for stock price prediction research.


Version: 0.1.0
Created: December 29, 2025
Status: Complete & Ready to Use βœ“