Skip to content

V4 dev#1

Open
Iosif2321 wants to merge 35 commits into
mainfrom
v4-dev
Open

V4 dev#1
Iosif2321 wants to merge 35 commits into
mainfrom
v4-dev

Conversation

@Iosif2321
Copy link
Copy Markdown
Owner

No description provided.

Iosif2321 and others added 30 commits March 13, 2026 22:20
Исправлены 3 критических бага, обнаруженных при первом ретрейне v3.20.0:

1. TCN SWA (tcn_model.py): SWA загружала необученные веса при early stop
   до swa_start_epoch. AveragedModel инициализировался сразу, но
   update_parameters() вызывался только после swa_start_epoch. Условие
   `if use_swa and swa_model is not None` было всегда True → для H1/H12/H24
   (early stop) загружались случайные начальные веса вместо best_state.
   Фикс: флаг swa_was_active=True только при реальном update_parameters().

2. Stacking OOF инверсия (horizon_model.py): Temporal weighting
   exp(-0.00015×age) давал вес ≈0 старым сэмплам в OOF фолдах.
   LogisticRegression meta-learner обучался на искажённых OOF →
   accuracy стэкинга 28-51% (хуже случайного) вместо 72-78%.
   Фикс: n_total=None в _make_sample_weights() при OOF (отключает
   temporal weighting), сохраняя его для final refit.

3. TCN NaN losses (tcn_model.py): FP16 mixed precision генерировал NaN
   в loss для H1. Gradient clipping не помогал — сам loss был NaN.
   Фикс: `if not torch.isfinite(loss): continue` — пропуск NaN батчей.

Ретюнинг порогов (backtest_v3.py, live_run.py):
- REGIME_THRESHOLDS: Flat 0.37→0.32, Trend 0.52→0.38, Volatile 0.36→0.32
- P_CORRECT_BY_REGIME: Flat 0.42, Trend 0.55→0.48, Volatile 0.52→0.42
- Результат: avg acc 82.8% (+coverage), avg cov 65.9% (+12.9 п.п.)

Адаптивные пороги (NEW: src/adaptive_threshold.py):
- Rolling window 288 свечей (1 день), update каждый час
- acc>78% AND cov<55% → снижаем порог; acc<72% → повышаем
- Интегрирован в live_run.py (с JSON persistence) и backtest_v3.py (--adaptive)

Документация обновлена: training_journal.md, roadmap_v3.20.md, CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Проблема: MultiTimeframeFeatures.merge_to_base() использовал reindex(ffill)
без сдвига. На исторических данных resample('4h') создавал полные TF-бары,
включавшие будущие 5m свечи. Batch backtest accuracy 82.8% (v3.17-v3.20)
была артефактом data leakage, реальная accuracy ~50%.

Масштаб: 36 TF-фичей + 80 TCN-эмбеддингов (116/234 фичей затронуты).

Фикс: df_features.shift(1) перед reindex — каждая 5m свеча получает фичи
только от предыдущего завершённого TF-бара. Потеря данных: 950/210k (0.45%).

Верификация: per-candle vs batch совпадают 35/36 TF-фичей (1 diff = EWM <0.01%).

Также: backtest_simulate.py — per-candle simulation (ground truth бэктест),
predict_v3.py — принимает df= параметр для simulate.

Требуется полный ретрейн (python train_v3.py, ~430 мин).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…acking

v3 исчерпана: batch accuracy 82.8% была артефактом Multi-TF lookahead bias,
реальная accuracy ~51% (random). Направление BTC на 5m из OHLCV не предсказуемо.

v4 строится вокруг принципа НАБЛЮДАЕМОСТИ:

Новое:
- src/model_inspector.py — интроспекция моделей на каждом тике
  (feature contributions, attention weights, embeddings, activations, gradients)
- version_tracker.py — автоотслеживание изменений с классификацией значимости
- docs/VERSIONING.md — стандарт 4.XX.YY.TAG
- docs/roadmap_v4.md — план: baseline → новые данные (L2, Funding, LS) → модель
- docs/CHANGELOG.md — автоматический changelog
- VERSION — файл текущей версии

Legacy cleanup (9 модулей → src/legacy/):
  alt_pipeline, teacher_student_ensemble, stacking_ensemble,
  models, predict, train, preprocessing, feature_selection, online_platt

Version: 4.00.00.DEV

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Заменён символ − (U+2212) на ASCII - для совместимости с cp1251 консолью.

Version: 4.00.00.DEV

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Валидация гипотезы: OHLCV фичи не содержат предсказательного сигнала для
BTC/USDT 5m direction. Validation accuracy 50.63% (baseline 50.49%).

Результат: accuracy ниже базового уровня → OHLCV = шум.
Следующие шаги: Phase 2 — L2 orderbook + alternative data.

Version: 4.02.00.DEV

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- 15/18 фичей имеют MI > 0.001
- Лучшее условие selective prediction: regime=0 (accuracy=0.628, n=3930)
- Составная фича spread_ofi MI=0.0176 (сильный сигнал)
- Рекомендуемый минимальный feature set: 16 фичей (avg_spread, ticks_volume_ratio, volume, turnover, spread_ofi, regime, oi_change, ofi_velocity, spread_rel, book_imbalance, ofi_raw, cumulative_delta, imbalance_mean, price_return, imbalance_x_wall, imbalance_x_ofi)
- Следующие шаги: обучить модель на сокращённом наборе 16 фичей (Strong+Weak) и валидировать на regime=0
Comprehensive CLI for feature quality analysis:
- lag: lookahead detection
- audit: permutation importance + bootstrap CI
- signal: MI + Granger + dCor
- wf: walk-forward metrics (Stage 1/2 AUC, F1)
- wf_grid: hyperparameter grid search
- regime: regime-specific performance
- quick: rapid feature check
- health: data drift monitoring

Usage example:
  python analysis_suite.py --mode audit --subset move
  python analysis_suite.py --mode wf_grid
- Added _load_and_merge_orderbook_features() method
- Loads orderbook_ofi_features.csv or orderbook_features_combined.csv
- Merges L2 features (ofi_raw, spread, book_imbalance, etc.) if model expects them
- Backward compatible: skips if model has no L2 features
- Logs warnings if L2 expected but file missing
Rationale:
- spread_ofi = spread_rel * sign(ofi_raw) has strong MI (0.0176) and significant permutation
- Physically: wide spread + positive order flow imbalance = stronger directional signal
- This feature is now part of the recommended minimal set (18 features)
- Removed: spread_rel is still present but will be dropped in next cleanup as permutation=0

Dataset regenerated: 7,771 rows, 21 features (including spread_ofi)
- обавлен src/features/feature_store.py — единый реестр признаков
  аменяет: feature_registry.py, FEATURE_REGISTRY.yaml, feature_audit_runner.py
  S1 признаки (6): volume, turnover, avg_spread, ticks_volume_ratio, spread_rel, regime
  S2 признаки (7): avg_spread, ticks_volume_ratio, spread_rel, regime, cumulative_delta, wall_ratio, book_imbalance
  Blacklist lookahead: price_return, imbalance_delta

- train_v4_2stage.py: убран хардкод признаков, подключён FeatureStore
  Stage1 и Stage2 используют разные наборы признаков из fs.features_for_s1/s2

- обавлен data/feature_verdicts.json — вердикты аудита (KEEP/WEAK/DROP)

- обавлен docs/DATA_PIPELINE.md — официальная цепочка данных (6 шагов)
- обавлен docs/FEATURES.md — документация признаков со статистиками

- бновлён .gitignore: исключены .data (36GB), .jsonl, логи, модели
   git хранятся только: src/, *.py, docs/, training_dataset.csv

- далены orphan скрипты: feature_registry.py, feature_audit_runner.py,
  refresh_catalog.py, sync_registry_with_dataset.py, FEATURE_REGISTRY.yaml

Текущее состояние модели:
  Stage1 AUC: 0.536  Stage2 AUC: 0.534  F1: 0.283
  Gap(all): 0.209 OVERFIT — требует регуляризации (следующий PR)
- Added src/features/ls_oi_features.py for Long/Short ratio and Open Interest features
- Added src/data_collectors/fetch_trades.py to fetch historical trades from Bybit
- Added src/features/trade_features.py to aggregate trades data
- Conducted lag profile and Walk-Forward analysis on new features
- Conclusion: No leading indicators found. Stage2 AUC did not improve (>0.56). More historical data needed.
…S + Orderbook features

- Add Binance CSV download via API (no data.binance.vision needed)
- Add Bybit market data: OI, Funding, LS Ratio
- Add orderbook feature extraction from parquet (sampled)
- Create unified_data_loader.py and data_sources.py
- Create feature_registry.py with 40 features
- Create prepare_training_data.py for merged dataset
- Update AGENTS.md with project guidelines
- Bump version to 4.03.00.DEV
- Extract orderbook features from parquet (book_imbalance, spread)
- Sample 1/100 rows for speed (9 min processing time)
- Create unified training dataset with 93k rows, 28 features
- Dataset includes: OHLCV, OI, Funding, LS, Orderbook
- Audit passed: balanced target, no lookahead bias
- Test for lookahead bias
- Target balance check
- Feature correlation analysis
- Baseline RF model: 46% accuracy (+13% over random)
- Added scripts/ directory structure (data/, eval/, analysis/, train/)
- Added tests/ directory with data integrity and feature tests
- New: lob_microstructure.py, transforms.py
- Updated: feature_store.py with v2.0 registry
- Added: phase8_report.md, walkforward logs
- Refactored legacy code into scripts/archive/ and src/legacy/
- Removed deprecated v1-v3 train scripts from root
Deleted old CSVs (BNBUSDT, ETHUSDT, SOLUSDT, funding, OI, LS ratio)
Deleted deprecated train scripts (v2, v3, baseline)
Deleted deprecated analysis scripts
Archived to scripts/archive/ and data/backup patterns
- Removed 'bucket_ticks % 10' condition - now computes deep metrics on EVERY tick
- Added micro-price (Cont & Stoikov 2010): vwap + q*(imb-0.5)*spread, q=0.3
- Added 3 micro-price output features: deviation_mean, deviation_std, trend
- Expected impact: +0.5-1.0% accuracy from full-resolution LOB features
New time features:
- hour_sin, hour_cos (cyclical hour encoding)
- dow_sin, dow_cos, is_weekend (day of week)
- is_us_session (14-21 UTC peak volume)
- funding_cycle_sin/cos (8h funding cycle)

Also added micro-price columns to LOB median-fill list for proper handling
of missing LOB data.
New compute_trend() function:
- Computes linear slope of accuracy across folds
- Returns direction: improving / degrading / stable
- Also tracks total_change over all folds

Added to aggregate_results:
- accuracy.trend: overall accuracy trend
- per_regime[rname].accuracy_trend: per-regime trends

This helps detect if model is improving or degrading over time.
…nd tick

- Micro-price computed on EVERY tick (full resolution)
- Deep metrics (sorting) every 2nd tick (5x reduction instead of previous 10x)
- Expected extraction time: ~2 hours for 326 files
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa517c8792

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@@ -13,7 +13,7 @@
from sklearn.metrics import accuracy_score
import joblib

from src.feature_selection import FeatureSelector
from src.legacy.feature_selection import FeatureSelector
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Badge Restore valid FeatureSelector module import

HorizonModel now imports src.legacy.feature_selection, but this commit does not add a src/legacy package and also removes the previous src/feature_selection.py. Any code path that imports predict_v3 (or directly imports src.horizon_model) will raise ModuleNotFoundError at import time, so training/prediction cannot start. src/two_stage_model.py has the same broken import and should be fixed in tandem.

Useful? React with 👍 / 👎.

import config_v3
from predict_v3 import PredictorV3
from src.online_platt import OnlinePlattScaler
from src.legacy.online_platt import OnlinePlattScaler
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Import OnlinePlattScaler from an existing module

live_run.py now imports OnlinePlattScaler from src.legacy.online_platt, but no such module exists in the repository after this change (and the prior src/online_platt.py was deleted). As a result, starting live mode fails immediately with ModuleNotFoundError before the trading loop begins.

Useful? React with 👍 / 👎.

@@ -167,7 +171,7 @@ def step_header(step: str, title: str, start_time: float = None):
from src.cross_horizon_meta import CrossHorizonMeta
from src.tcn_model import TCNTrainer
from src.error_predictor import ErrorPredictor
from src.alt_pipeline import AltPipeline
from src.legacy.alt_pipeline import AltPipeline
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Point AltPipeline import at a real module

The legacy trainer imports AltPipeline from src.legacy.alt_pipeline, but the commit does not provide that module (and removes the old src/alt_pipeline.py). Running scripts/train/v3_legacy.py will fail on import, so the legacy training entrypoint is unusable unless this import is restored or guarded.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant