V4 dev by Iosif2321 · Pull Request #1 · Iosif2321/CP

Iosif2321 · 2026-03-30T19:48:20Z

No description provided.

Исправлены 3 критических бага, обнаруженных при первом ретрейне v3.20.0: 1. TCN SWA (tcn_model.py): SWA загружала необученные веса при early stop до swa_start_epoch. AveragedModel инициализировался сразу, но update_parameters() вызывался только после swa_start_epoch. Условие `if use_swa and swa_model is not None` было всегда True → для H1/H12/H24 (early stop) загружались случайные начальные веса вместо best_state. Фикс: флаг swa_was_active=True только при реальном update_parameters(). 2. Stacking OOF инверсия (horizon_model.py): Temporal weighting exp(-0.00015×age) давал вес ≈0 старым сэмплам в OOF фолдах. LogisticRegression meta-learner обучался на искажённых OOF → accuracy стэкинга 28-51% (хуже случайного) вместо 72-78%. Фикс: n_total=None в _make_sample_weights() при OOF (отключает temporal weighting), сохраняя его для final refit. 3. TCN NaN losses (tcn_model.py): FP16 mixed precision генерировал NaN в loss для H1. Gradient clipping не помогал — сам loss был NaN. Фикс: `if not torch.isfinite(loss): continue` — пропуск NaN батчей. Ретюнинг порогов (backtest_v3.py, live_run.py): - REGIME_THRESHOLDS: Flat 0.37→0.32, Trend 0.52→0.38, Volatile 0.36→0.32 - P_CORRECT_BY_REGIME: Flat 0.42, Trend 0.55→0.48, Volatile 0.52→0.42 - Результат: avg acc 82.8% (+coverage), avg cov 65.9% (+12.9 п.п.) Адаптивные пороги (NEW: src/adaptive_threshold.py): - Rolling window 288 свечей (1 день), update каждый час - acc>78% AND cov<55% → снижаем порог; acc<72% → повышаем - Интегрирован в live_run.py (с JSON persistence) и backtest_v3.py (--adaptive) Документация обновлена: training_journal.md, roadmap_v3.20.md, CLAUDE.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Проблема: MultiTimeframeFeatures.merge_to_base() использовал reindex(ffill) без сдвига. На исторических данных resample('4h') создавал полные TF-бары, включавшие будущие 5m свечи. Batch backtest accuracy 82.8% (v3.17-v3.20) была артефактом data leakage, реальная accuracy ~50%. Масштаб: 36 TF-фичей + 80 TCN-эмбеддингов (116/234 фичей затронуты). Фикс: df_features.shift(1) перед reindex — каждая 5m свеча получает фичи только от предыдущего завершённого TF-бара. Потеря данных: 950/210k (0.45%). Верификация: per-candle vs batch совпадают 35/36 TF-фичей (1 diff = EWM <0.01%). Также: backtest_simulate.py — per-candle simulation (ground truth бэктест), predict_v3.py — принимает df= параметр для simulate. Требуется полный ретрейн (python train_v3.py, ~430 мин). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…acking v3 исчерпана: batch accuracy 82.8% была артефактом Multi-TF lookahead bias, реальная accuracy ~51% (random). Направление BTC на 5m из OHLCV не предсказуемо. v4 строится вокруг принципа НАБЛЮДАЕМОСТИ: Новое: - src/model_inspector.py — интроспекция моделей на каждом тике (feature contributions, attention weights, embeddings, activations, gradients) - version_tracker.py — автоотслеживание изменений с классификацией значимости - docs/VERSIONING.md — стандарт 4.XX.YY.TAG - docs/roadmap_v4.md — план: baseline → новые данные (L2, Funding, LS) → модель - docs/CHANGELOG.md — автоматический changelog - VERSION — файл текущей версии Legacy cleanup (9 модулей → src/legacy/): alt_pipeline, teacher_student_ensemble, stacking_ensemble, models, predict, train, preprocessing, feature_selection, online_platt Version: 4.00.00.DEV Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Заменён символ − (U+2212) на ASCII - для совместимости с cp1251 консолью. Version: 4.00.00.DEV Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rget (H1), feature validation preparation

…re validation

…p files

Валидация гипотезы: OHLCV фичи не содержат предсказательного сигнала для BTC/USDT 5m direction. Validation accuracy 50.63% (baseline 50.49%). Результат: accuracy ниже базового уровня → OHLCV = шум. Следующие шаги: Phase 2 — L2 orderbook + alternative data. Version: 4.02.00.DEV Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- 15/18 фичей имеют MI > 0.001 - Лучшее условие selective prediction: regime=0 (accuracy=0.628, n=3930) - Составная фича spread_ofi MI=0.0176 (сильный сигнал) - Рекомендуемый минимальный feature set: 16 фичей (avg_spread, ticks_volume_ratio, volume, turnover, spread_ofi, regime, oi_change, ofi_velocity, spread_rel, book_imbalance, ofi_raw, cumulative_delta, imbalance_mean, price_return, imbalance_x_wall, imbalance_x_ofi) - Следующие шаги: обучить модель на сокращённом наборе 16 фичей (Strong+Weak) и валидировать на regime=0

Comprehensive CLI for feature quality analysis: - lag: lookahead detection - audit: permutation importance + bootstrap CI - signal: MI + Granger + dCor - wf: walk-forward metrics (Stage 1/2 AUC, F1) - wf_grid: hyperparameter grid search - regime: regime-specific performance - quick: rapid feature check - health: data drift monitoring Usage example: python analysis_suite.py --mode audit --subset move python analysis_suite.py --mode wf_grid

- Added _load_and_merge_orderbook_features() method - Loads orderbook_ofi_features.csv or orderbook_features_combined.csv - Merges L2 features (ofi_raw, spread, book_imbalance, etc.) if model expects them - Backward compatible: skips if model has no L2 features - Logs warnings if L2 expected but file missing

Rationale: - spread_ofi = spread_rel * sign(ofi_raw) has strong MI (0.0176) and significant permutation - Physically: wide spread + positive order flow imbalance = stronger directional signal - This feature is now part of the recommended minimal set (18 features) - Removed: spread_rel is still present but will be dropped in next cleanup as permutation=0 Dataset regenerated: 7,771 rows, 21 features (including spread_ofi)

- обавлен src/features/feature_store.py — единый реестр признаков аменяет: feature_registry.py, FEATURE_REGISTRY.yaml, feature_audit_runner.py S1 признаки (6): volume, turnover, avg_spread, ticks_volume_ratio, spread_rel, regime S2 признаки (7): avg_spread, ticks_volume_ratio, spread_rel, regime, cumulative_delta, wall_ratio, book_imbalance Blacklist lookahead: price_return, imbalance_delta - train_v4_2stage.py: убран хардкод признаков, подключён FeatureStore Stage1 и Stage2 используют разные наборы признаков из fs.features_for_s1/s2 - обавлен data/feature_verdicts.json — вердикты аудита (KEEP/WEAK/DROP) - обавлен docs/DATA_PIPELINE.md — официальная цепочка данных (6 шагов) - обавлен docs/FEATURES.md — документация признаков со статистиками - бновлён .gitignore: исключены .data (36GB), .jsonl, логи, модели git хранятся только: src/, *.py, docs/, training_dataset.csv - далены orphan скрипты: feature_registry.py, feature_audit_runner.py, refresh_catalog.py, sync_registry_with_dataset.py, FEATURE_REGISTRY.yaml Текущее состояние модели: Stage1 AUC: 0.536 Stage2 AUC: 0.534 F1: 0.283 Gap(all): 0.209 OVERFIT — требует регуляризации (следующий PR)

- Added src/features/ls_oi_features.py for Long/Short ratio and Open Interest features - Added src/data_collectors/fetch_trades.py to fetch historical trades from Bybit - Added src/features/trade_features.py to aggregate trades data - Conducted lag profile and Walk-Forward analysis on new features - Conclusion: No leading indicators found. Stage2 AUC did not improve (>0.56). More historical data needed.

…S + Orderbook features - Add Binance CSV download via API (no data.binance.vision needed) - Add Bybit market data: OI, Funding, LS Ratio - Add orderbook feature extraction from parquet (sampled) - Create unified_data_loader.py and data_sources.py - Create feature_registry.py with 40 features - Create prepare_training_data.py for merged dataset - Update AGENTS.md with project guidelines - Bump version to 4.03.00.DEV

- Extract orderbook features from parquet (book_imbalance, spread) - Sample 1/100 rows for speed (9 min processing time) - Create unified training dataset with 93k rows, 28 features - Dataset includes: OHLCV, OI, Funding, LS, Orderbook - Audit passed: balanced target, no lookahead bias

- Test for lookahead bias - Target balance check - Feature correlation analysis - Baseline RF model: 46% accuracy (+13% over random)

- Added scripts/ directory structure (data/, eval/, analysis/, train/) - Added tests/ directory with data integrity and feature tests - New: lob_microstructure.py, transforms.py - Updated: feature_store.py with v2.0 registry - Added: phase8_report.md, walkforward logs - Refactored legacy code into scripts/archive/ and src/legacy/ - Removed deprecated v1-v3 train scripts from root

Deleted old CSVs (BNBUSDT, ETHUSDT, SOLUSDT, funding, OI, LS ratio) Deleted deprecated train scripts (v2, v3, baseline) Deleted deprecated analysis scripts Archived to scripts/archive/ and data/backup patterns

- Removed 'bucket_ticks % 10' condition - now computes deep metrics on EVERY tick - Added micro-price (Cont & Stoikov 2010): vwap + q*(imb-0.5)*spread, q=0.3 - Added 3 micro-price output features: deviation_mean, deviation_std, trend - Expected impact: +0.5-1.0% accuracy from full-resolution LOB features

New time features: - hour_sin, hour_cos (cyclical hour encoding) - dow_sin, dow_cos, is_weekend (day of week) - is_us_session (14-21 UTC peak volume) - funding_cycle_sin/cos (8h funding cycle) Also added micro-price columns to LOB median-fill list for proper handling of missing LOB data.

New compute_trend() function: - Computes linear slope of accuracy across folds - Returns direction: improving / degrading / stable - Also tracks total_change over all folds Added to aggregate_results: - accuracy.trend: overall accuracy trend - per_regime[rname].accuracy_trend: per-regime trends This helps detect if model is improving or degrading over time.

…nd tick - Micro-price computed on EVERY tick (full resolution) - Deep metrics (sorting) every 2nd tick (5x reduction instead of previous 10x) - Expected extraction time: ~2 hours for 326 files

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa517c8792

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-30T19:53:52Z

@@ -13,7 +13,7 @@
 from sklearn.metrics import accuracy_score
 import joblib

-from src.feature_selection import FeatureSelector
+from src.legacy.feature_selection import FeatureSelector


Restore valid FeatureSelector module import

HorizonModel now imports src.legacy.feature_selection, but this commit does not add a src/legacy package and also removes the previous src/feature_selection.py. Any code path that imports predict_v3 (or directly imports src.horizon_model) will raise ModuleNotFoundError at import time, so training/prediction cannot start. src/two_stage_model.py has the same broken import and should be fixed in tandem.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-30T19:53:52Z

 import config_v3
 from predict_v3 import PredictorV3
-from src.online_platt import OnlinePlattScaler
+from src.legacy.online_platt import OnlinePlattScaler


Import OnlinePlattScaler from an existing module

live_run.py now imports OnlinePlattScaler from src.legacy.online_platt, but no such module exists in the repository after this change (and the prior src/online_platt.py was deleted). As a result, starting live mode fails immediately with ModuleNotFoundError before the trading loop begins.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-30T19:53:52Z

@@ -167,7 +171,7 @@ def step_header(step: str, title: str, start_time: float = None):
 from src.cross_horizon_meta import CrossHorizonMeta
 from src.tcn_model import TCNTrainer
 from src.error_predictor import ErrorPredictor
-from src.alt_pipeline import AltPipeline
+from src.legacy.alt_pipeline import AltPipeline


Point AltPipeline import at a real module

The legacy trainer imports AltPipeline from src.legacy.alt_pipeline, but the commit does not provide that module (and removes the old src/alt_pipeline.py). Running scripts/train/v3_legacy.py will fail on import, so the legacy training entrypoint is unusable unless this import is restored or guarded.

Useful? React with 👍 / 👎.

Iosif2321 and others added 30 commits March 13, 2026 22:20

fix: version_tracker.py Unicode для Windows cp1251

86bdc54

Заменён символ − (U+2212) на ASCII - для совместимости с cp1251 консолью. Version: 4.00.00.DEV Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

4.00.01.DEV

92d6c7e

fix: integrate ModelInspector into PredictorV3 and fix legacy imports

c325416

feat: phase 1 baseline evaluation and feature importance report

a98f1e4

feat: phase 2 L2 orderbook collector implementation

32e5b5a

feat: phase 2 L2 feature engineering logic

ca5b4f7

feat: phase 2 Bybit market data long/short ratio integration

322d113

feat: phase 2 integrate Long/Short ratio into feature pipeline

8a62657

feat: v4.01.00 - Order Flow Imbalance (OFI) extraction, multiclass ta…

a0506c2

…rget (H1), feature validation preparation

docs: Update docs and scripts after OFI features extraction and featu…

460b7d6

…re validation

chore: Update .gitignore rules to properly ignore data, logs, and tem…

bba2c22

…p files

V4.01.01.DEV Feature Analysis

ff64b96

V4.01.02.DEV

fc46f3b

Data

f0e0140

fix: UTF-8 кодировка feature_verdicts.json и FeatureStore

3db3118

test: add dataset validation tests and baseline metrics

3beb251

- Test for lookahead bias - Target balance check - Feature correlation analysis - Baseline RF model: 46% accuracy (+13% over random)

docs: update AGENTS.md with v4.03.01 metrics and commands

72d5196

Iosif2321 added 5 commits March 28, 2026 10:10

Remove deprecated data files and legacy scripts

7ca4bdc

Deleted old CSVs (BNBUSDT, ETHUSDT, SOLUSDT, funding, OI, LS ratio) Deleted deprecated train scripts (v2, v3, baseline) Deleted deprecated analysis scripts Archived to scripts/archive/ and data/backup patterns

lob_microstructure.py: microprice on every tick, deep_metrics every 2…

aa517c8

…nd tick - Micro-price computed on EVERY tick (full resolution) - Deep metrics (sorting) every 2nd tick (5x reduction instead of previous 10x) - Expected extraction time: ~2 hours for 326 files

chatgpt-codex-connector Bot reviewed Mar 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V4 dev#1

V4 dev#1
Iosif2321 wants to merge 35 commits into
mainfrom
v4-dev

Iosif2321 commented Mar 30, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 30, 2026

Uh oh!

chatgpt-codex-connector Bot Mar 30, 2026

Uh oh!

chatgpt-codex-connector Bot Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Iosif2321 commented Mar 30, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant