Conversation
Исправлены 3 критических бага, обнаруженных при первом ретрейне v3.20.0: 1. TCN SWA (tcn_model.py): SWA загружала необученные веса при early stop до swa_start_epoch. AveragedModel инициализировался сразу, но update_parameters() вызывался только после swa_start_epoch. Условие `if use_swa and swa_model is not None` было всегда True → для H1/H12/H24 (early stop) загружались случайные начальные веса вместо best_state. Фикс: флаг swa_was_active=True только при реальном update_parameters(). 2. Stacking OOF инверсия (horizon_model.py): Temporal weighting exp(-0.00015×age) давал вес ≈0 старым сэмплам в OOF фолдах. LogisticRegression meta-learner обучался на искажённых OOF → accuracy стэкинга 28-51% (хуже случайного) вместо 72-78%. Фикс: n_total=None в _make_sample_weights() при OOF (отключает temporal weighting), сохраняя его для final refit. 3. TCN NaN losses (tcn_model.py): FP16 mixed precision генерировал NaN в loss для H1. Gradient clipping не помогал — сам loss был NaN. Фикс: `if not torch.isfinite(loss): continue` — пропуск NaN батчей. Ретюнинг порогов (backtest_v3.py, live_run.py): - REGIME_THRESHOLDS: Flat 0.37→0.32, Trend 0.52→0.38, Volatile 0.36→0.32 - P_CORRECT_BY_REGIME: Flat 0.42, Trend 0.55→0.48, Volatile 0.52→0.42 - Результат: avg acc 82.8% (+coverage), avg cov 65.9% (+12.9 п.п.) Адаптивные пороги (NEW: src/adaptive_threshold.py): - Rolling window 288 свечей (1 день), update каждый час - acc>78% AND cov<55% → снижаем порог; acc<72% → повышаем - Интегрирован в live_run.py (с JSON persistence) и backtest_v3.py (--adaptive) Документация обновлена: training_journal.md, roadmap_v3.20.md, CLAUDE.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Проблема: MultiTimeframeFeatures.merge_to_base() использовал reindex(ffill)
без сдвига. На исторических данных resample('4h') создавал полные TF-бары,
включавшие будущие 5m свечи. Batch backtest accuracy 82.8% (v3.17-v3.20)
была артефактом data leakage, реальная accuracy ~50%.
Масштаб: 36 TF-фичей + 80 TCN-эмбеддингов (116/234 фичей затронуты).
Фикс: df_features.shift(1) перед reindex — каждая 5m свеча получает фичи
только от предыдущего завершённого TF-бара. Потеря данных: 950/210k (0.45%).
Верификация: per-candle vs batch совпадают 35/36 TF-фичей (1 diff = EWM <0.01%).
Также: backtest_simulate.py — per-candle simulation (ground truth бэктест),
predict_v3.py — принимает df= параметр для simulate.
Требуется полный ретрейн (python train_v3.py, ~430 мин).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…acking v3 исчерпана: batch accuracy 82.8% была артефактом Multi-TF lookahead bias, реальная accuracy ~51% (random). Направление BTC на 5m из OHLCV не предсказуемо. v4 строится вокруг принципа НАБЛЮДАЕМОСТИ: Новое: - src/model_inspector.py — интроспекция моделей на каждом тике (feature contributions, attention weights, embeddings, activations, gradients) - version_tracker.py — автоотслеживание изменений с классификацией значимости - docs/VERSIONING.md — стандарт 4.XX.YY.TAG - docs/roadmap_v4.md — план: baseline → новые данные (L2, Funding, LS) → модель - docs/CHANGELOG.md — автоматический changelog - VERSION — файл текущей версии Legacy cleanup (9 модулей → src/legacy/): alt_pipeline, teacher_student_ensemble, stacking_ensemble, models, predict, train, preprocessing, feature_selection, online_platt Version: 4.00.00.DEV Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Заменён символ − (U+2212) на ASCII - для совместимости с cp1251 консолью. Version: 4.00.00.DEV Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rget (H1), feature validation preparation
Валидация гипотезы: OHLCV фичи не содержат предсказательного сигнала для BTC/USDT 5m direction. Validation accuracy 50.63% (baseline 50.49%). Результат: accuracy ниже базового уровня → OHLCV = шум. Следующие шаги: Phase 2 — L2 orderbook + alternative data. Version: 4.02.00.DEV Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- 15/18 фичей имеют MI > 0.001 - Лучшее условие selective prediction: regime=0 (accuracy=0.628, n=3930) - Составная фича spread_ofi MI=0.0176 (сильный сигнал) - Рекомендуемый минимальный feature set: 16 фичей (avg_spread, ticks_volume_ratio, volume, turnover, spread_ofi, regime, oi_change, ofi_velocity, spread_rel, book_imbalance, ofi_raw, cumulative_delta, imbalance_mean, price_return, imbalance_x_wall, imbalance_x_ofi) - Следующие шаги: обучить модель на сокращённом наборе 16 фичей (Strong+Weak) и валидировать на regime=0
Comprehensive CLI for feature quality analysis: - lag: lookahead detection - audit: permutation importance + bootstrap CI - signal: MI + Granger + dCor - wf: walk-forward metrics (Stage 1/2 AUC, F1) - wf_grid: hyperparameter grid search - regime: regime-specific performance - quick: rapid feature check - health: data drift monitoring Usage example: python analysis_suite.py --mode audit --subset move python analysis_suite.py --mode wf_grid
- Added _load_and_merge_orderbook_features() method - Loads orderbook_ofi_features.csv or orderbook_features_combined.csv - Merges L2 features (ofi_raw, spread, book_imbalance, etc.) if model expects them - Backward compatible: skips if model has no L2 features - Logs warnings if L2 expected but file missing
Rationale: - spread_ofi = spread_rel * sign(ofi_raw) has strong MI (0.0176) and significant permutation - Physically: wide spread + positive order flow imbalance = stronger directional signal - This feature is now part of the recommended minimal set (18 features) - Removed: spread_rel is still present but will be dropped in next cleanup as permutation=0 Dataset regenerated: 7,771 rows, 21 features (including spread_ofi)
- обавлен src/features/feature_store.py — единый реестр признаков аменяет: feature_registry.py, FEATURE_REGISTRY.yaml, feature_audit_runner.py S1 признаки (6): volume, turnover, avg_spread, ticks_volume_ratio, spread_rel, regime S2 признаки (7): avg_spread, ticks_volume_ratio, spread_rel, regime, cumulative_delta, wall_ratio, book_imbalance Blacklist lookahead: price_return, imbalance_delta - train_v4_2stage.py: убран хардкод признаков, подключён FeatureStore Stage1 и Stage2 используют разные наборы признаков из fs.features_for_s1/s2 - обавлен data/feature_verdicts.json — вердикты аудита (KEEP/WEAK/DROP) - обавлен docs/DATA_PIPELINE.md — официальная цепочка данных (6 шагов) - обавлен docs/FEATURES.md — документация признаков со статистиками - бновлён .gitignore: исключены .data (36GB), .jsonl, логи, модели git хранятся только: src/, *.py, docs/, training_dataset.csv - далены orphan скрипты: feature_registry.py, feature_audit_runner.py, refresh_catalog.py, sync_registry_with_dataset.py, FEATURE_REGISTRY.yaml Текущее состояние модели: Stage1 AUC: 0.536 Stage2 AUC: 0.534 F1: 0.283 Gap(all): 0.209 OVERFIT — требует регуляризации (следующий PR)
- Added src/features/ls_oi_features.py for Long/Short ratio and Open Interest features - Added src/data_collectors/fetch_trades.py to fetch historical trades from Bybit - Added src/features/trade_features.py to aggregate trades data - Conducted lag profile and Walk-Forward analysis on new features - Conclusion: No leading indicators found. Stage2 AUC did not improve (>0.56). More historical data needed.
…S + Orderbook features - Add Binance CSV download via API (no data.binance.vision needed) - Add Bybit market data: OI, Funding, LS Ratio - Add orderbook feature extraction from parquet (sampled) - Create unified_data_loader.py and data_sources.py - Create feature_registry.py with 40 features - Create prepare_training_data.py for merged dataset - Update AGENTS.md with project guidelines - Bump version to 4.03.00.DEV
- Extract orderbook features from parquet (book_imbalance, spread) - Sample 1/100 rows for speed (9 min processing time) - Create unified training dataset with 93k rows, 28 features - Dataset includes: OHLCV, OI, Funding, LS, Orderbook - Audit passed: balanced target, no lookahead bias
- Test for lookahead bias - Target balance check - Feature correlation analysis - Baseline RF model: 46% accuracy (+13% over random)
- Added scripts/ directory structure (data/, eval/, analysis/, train/) - Added tests/ directory with data integrity and feature tests - New: lob_microstructure.py, transforms.py - Updated: feature_store.py with v2.0 registry - Added: phase8_report.md, walkforward logs - Refactored legacy code into scripts/archive/ and src/legacy/ - Removed deprecated v1-v3 train scripts from root
Deleted old CSVs (BNBUSDT, ETHUSDT, SOLUSDT, funding, OI, LS ratio) Deleted deprecated train scripts (v2, v3, baseline) Deleted deprecated analysis scripts Archived to scripts/archive/ and data/backup patterns
- Removed 'bucket_ticks % 10' condition - now computes deep metrics on EVERY tick - Added micro-price (Cont & Stoikov 2010): vwap + q*(imb-0.5)*spread, q=0.3 - Added 3 micro-price output features: deviation_mean, deviation_std, trend - Expected impact: +0.5-1.0% accuracy from full-resolution LOB features
New time features: - hour_sin, hour_cos (cyclical hour encoding) - dow_sin, dow_cos, is_weekend (day of week) - is_us_session (14-21 UTC peak volume) - funding_cycle_sin/cos (8h funding cycle) Also added micro-price columns to LOB median-fill list for proper handling of missing LOB data.
New compute_trend() function: - Computes linear slope of accuracy across folds - Returns direction: improving / degrading / stable - Also tracks total_change over all folds Added to aggregate_results: - accuracy.trend: overall accuracy trend - per_regime[rname].accuracy_trend: per-regime trends This helps detect if model is improving or degrading over time.
…nd tick - Micro-price computed on EVERY tick (full resolution) - Deep metrics (sorting) every 2nd tick (5x reduction instead of previous 10x) - Expected extraction time: ~2 hours for 326 files
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aa517c8792
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @@ -13,7 +13,7 @@ | |||
| from sklearn.metrics import accuracy_score | |||
| import joblib | |||
|
|
|||
| from src.feature_selection import FeatureSelector | |||
| from src.legacy.feature_selection import FeatureSelector | |||
There was a problem hiding this comment.
Restore valid FeatureSelector module import
HorizonModel now imports src.legacy.feature_selection, but this commit does not add a src/legacy package and also removes the previous src/feature_selection.py. Any code path that imports predict_v3 (or directly imports src.horizon_model) will raise ModuleNotFoundError at import time, so training/prediction cannot start. src/two_stage_model.py has the same broken import and should be fixed in tandem.
Useful? React with 👍 / 👎.
| import config_v3 | ||
| from predict_v3 import PredictorV3 | ||
| from src.online_platt import OnlinePlattScaler | ||
| from src.legacy.online_platt import OnlinePlattScaler |
There was a problem hiding this comment.
Import OnlinePlattScaler from an existing module
live_run.py now imports OnlinePlattScaler from src.legacy.online_platt, but no such module exists in the repository after this change (and the prior src/online_platt.py was deleted). As a result, starting live mode fails immediately with ModuleNotFoundError before the trading loop begins.
Useful? React with 👍 / 👎.
| @@ -167,7 +171,7 @@ def step_header(step: str, title: str, start_time: float = None): | |||
| from src.cross_horizon_meta import CrossHorizonMeta | |||
| from src.tcn_model import TCNTrainer | |||
| from src.error_predictor import ErrorPredictor | |||
| from src.alt_pipeline import AltPipeline | |||
| from src.legacy.alt_pipeline import AltPipeline | |||
There was a problem hiding this comment.
Point AltPipeline import at a real module
The legacy trainer imports AltPipeline from src.legacy.alt_pipeline, but the commit does not provide that module (and removes the old src/alt_pipeline.py). Running scripts/train/v3_legacy.py will fail on import, so the legacy training entrypoint is unusable unless this import is restored or guarded.
Useful? React with 👍 / 👎.
No description provided.