Phase 3 Execution Plan — Code Cleanup & Quality

Status: In Progress (2/10 Complete)
Goal: Fix critical bugs, remove technical debt, improve testability
Duration: 10 PRs, estimated 2-3 days of focused work

Last Updated: 2026-01-11
Completed: PR #1 ✅, PR #2 ✅

Execution Strategy

Each PR is self-contained and can be executed by a fresh agent with full context.
PRs are ordered by dependency and risk (critical bugs first).

Agent Context Required for Each PR:

Full project structure (from AI_CONTEXT.md)
Specific files to edit (provided in each PR)
Acceptance criteria (must pass before PR is complete)

PR #1: Fix Plotting Crash (CRITICAL — P0) ✅ COMPLETE

Completed: 2026-01-11
Status: ✅ All tests passing (46/46)
Testing Guide: See TESTING_PR1.md

Problem

demo_latest.py passes pd.Series to plot_equity_curve() which expects CSV path string.
Result: Every demo crashes at plotting step.

Solution Implemented

Option A — Updated plot_equity_curve() to accept both Series and str:

def plot_equity_curve(
    equity: pd.Series | str,
    output_file: str,
    title: str
) -> None:
    """Plot equity curve from Series or CSV file."""
    if isinstance(equity, str):
        equity_df = pd.read_csv(equity, index_col=0, parse_dates=True)
        equity_series = equity_df["equity"]
    else:
        equity_series = equity
    # ... plotting code

Files Modified

src/plotting.py — Added dual input support
tests/test_plotting.py — Created 7 comprehensive tests
scripts/test_plotting.py — Enhanced verification script
scripts/pr1_synthetic_demo.py — New end-to-end demo

Verification

# All 46 tests pass
pytest tests/ -v

# Quick verification (3 commands)
pytest tests/test_plotting.py -v
python scripts/test_plotting.py
python scripts/pr1_synthetic_demo.py

Risk

LOW — Isolated change, comprehensive tests, backward compatible

PR #2: Add End-to-End Yahoo Integration Test (HIGH — P0) ✅ COMPLETE

Completed: 2026-01-11
Status: ✅ 3 tests added, all passing (49 total)
Testing Guide: See TESTING_PR2.md

Problem

No test verifies fetch_yahoo_data() works end-to-end with Yahoo API response.
Tests only cover helper functions, not the integrated flow.

Solution Implemented

Added TestYahooIntegration class with 3 comprehensive tests:

1. test_fetch_yahoo_data_full_flow

Mocks crumb request and chart data response
Verifies full pipeline: fetch → parse → cache → return
Validates cache file creation
Confirms data format and values

2. test_fetch_yahoo_data_incremental_update

Pre-populates cache with old data
Fetches overlapping date range
Verifies old + new data merge correctly
Tests incremental cache strategy

3. test_fetch_yahoo_data_with_cache_disabled

Pre-populates cache with dummy data
Fetches with use_cache=False
Confirms fresh data is fetched (cache bypassed)

Files Modified

tests/test_yahoo.py — Added TestYahooIntegration class (3 new tests)

Verification

# Run new integration tests
pytest tests/test_yahoo.py::TestYahooIntegration -v

# Run all Yahoo tests (16 total)
pytest tests/test_yahoo.py -v

# Run all project tests (49 total)
pytest tests/ -v

Key Achievement

All tests are 100% offline — No real network calls, using mocked responses.
This validates the code works correctly even when Yahoo API returns 429 errors.

Risk

LOW — Pure test addition, no production code changes

Changes

class TestYahooIntegration:
    """End-to-end tests with mocked network."""

    @pytest.fixture
    def sample_response(self):
        fixture_path = Path(__file__).parent / "fixtures" / "yahoo_sample.json"
        with open(fixture_path) as f:
            return json.load(f)

    def test_fetch_yahoo_data_full_flow(self, sample_response):
        """Test full flow: fetch → parse → cache → return."""
        with tempfile.TemporaryDirectory() as tmpdir:
            with patch("src.yahoo._get_data_dir") as mock_dir:
                mock_dir.return_value = Path(tmpdir)

                # Mock urlopen to return sample response
                with patch("src.yahoo.urlopen") as mock_urlopen:
                    # First call: crumb
                    crumb_response = Mock()
                    crumb_response.read.return_value = b"test_crumb"

                    # Second call: chart data
                    chart_response = Mock()
                    chart_response.read.return_value = json.dumps(sample_response).encode()

                    mock_urlopen.return_value.__enter__.side_effect = [
                        crumb_response,
                        chart_response
                    ]

                    # Fetch data
                    prices = fetch_yahoo_data(
                        "SPY",
                        start_date=datetime(2024, 1, 1),
                        end_date=datetime(2024, 1, 5),
                        use_cache=True
                    )

                    # Verify results
                    assert len(prices) == 5
                    assert prices.iloc[0] == 385.5
                    assert prices.iloc[-1] == 390.5

                    # Verify cache was created
                    cache_file = Path(tmpdir) / "SPY_1d_persistent.csv"
                    assert cache_file.exists()

Acceptance Criteria

pytest tests/test_yahoo.py::TestYahooIntegration::test_fetch_yahoo_data_full_flow -v
# Must pass offline, no network calls

Risk

LOW — Pure test addition, no production code changes

PR #3: Remove Dead Cache Code (MEDIUM — P1)

Problem

Legacy cache functions (_load_from_cache, _save_to_cache, _get_cache_path) exist but are never called.
Confuses developers about which cache system to use.

Files to Edit

src/yahoo.py (remove 3 functions)
tests/test_yahoo.py (remove legacy cache tests)

Changes

Delete from src/yahoo.py:

_get_cache_path() (lines 48-55)
_load_from_cache() (lines 166-177)
_save_to_cache() (lines 180-189)

Delete from tests/test_yahoo.py:

test_save_and_load_cache() in TestYahooDataFetching

Verify no references remain:

grep -r "_get_cache_path\|_load_from_cache\|_save_to_cache" src/ tests/
# Should return empty

Acceptance Criteria

# All tests still pass
pytest tests/test_yahoo.py -v

# Persistent cache tests still work
pytest tests/test_yahoo.py::TestPersistentCache -v

# Demo still works
python scripts/demo_latest.py --lookback 30

Risk

LOW — Dead code removal, verified by grep

PR #4: Add Persistent Cache Corruption Handling (MEDIUM — P1)

Problem

_load_persistent_cache() silently returns None on any exception.
Corrupted cache files cause full re-downloads with no warning.

Files to Edit

src/yahoo.py (add logging and backup)

Changes

import logging

logger = logging.getLogger(__name__)

def _load_persistent_cache(ticker: str, interval: str = "1d") -> pd.Series | None:
    """Load all cached data for a ticker from persistent cache."""
    cache_path = _get_persistent_cache_path(ticker, interval)
    if cache_path.exists():
        try:
            df = pd.read_csv(cache_path, parse_dates=["datetime"])
            df = df.sort_values("datetime")
            return pd.Series(df["close"].values, index=df["datetime"], name=ticker)
        except Exception as e:
            # Rename corrupted cache and log warning
            backup_path = cache_path.with_suffix(".csv.bak")
            cache_path.rename(backup_path)
            logger.warning(
                f"Corrupted cache for {ticker}: {e}. "
                f"Renamed to {backup_path}. Re-fetching..."
            )
            return None
    return None

Acceptance Criteria

# Corrupt cache file
echo "garbage" > data/SPY_1d_persistent.csv

# Run fetch (should log warning and recover)
python -c "from src.yahoo import fetch_yahoo_data; fetch_yahoo_data('SPY', lookback_days=5)"

# Verify backup was created
ls data/SPY_1d_persistent.csv.bak

# Verify new cache was created
head data/SPY_1d_persistent.csv

Risk

LOW — Defensive code addition, improves reliability

PR #5: Improve Crumb Mechanism (MEDIUM — P1)

Problem

Crumb is fetched from /v1/test/getcrumb without session cookies.
May fail with 401/403 errors intermittently.

Files to Edit

src/yahoo.py (test and potentially remove crumb)

Investigation First

Test if Yahoo API works without crumb:

# Remove crumb code temporarily and test
params = {
    "period1": start_ts,
    "period2": end_ts,
    "interval": interval,
    "events": "history",
}
# No crumb parameter
url = f"https://query1.finance.yahoo.com/v8/finance/chart/{ticker}?{urlencode(params)}"

Changes

If works without crumb: Remove crumb fetching code entirely
If doesn't work: Add proper session cookie handling

Acceptance Criteria

# Run 10 times, must succeed 10/10 times
for i in {1..10}; do
  python -c "from src.yahoo import fetch_yahoo_data; print(fetch_yahoo_data('SPY', lookback_days=5))"
done

Risk

MEDIUM — May require API testing, but reversible

PR #6: Add Session Reuse to Yahoo Fetcher (MEDIUM — P2)

Problem

Every fetch_yahoo_data() call creates 2 new HTTP connections (crumb + data).
No connection pooling, burns rate limits faster.

Files to Edit

src/yahoo.py (add session parameter)
scripts/demo_latest.py (create session once)

Changes

# src/yahoo.py
from urllib.request import build_opener, HTTPCookieProcessor
from http.cookiejar import CookieJar

def fetch_yahoo_data(
    ticker: str,
    ...,
    session: urllib.request.OpenerDirector | None = None,
) -> pd.Series:
    """Fetch with optional session reuse."""
    # If no session, create disposable opener
    if session is None:
        cookie_jar = CookieJar()
        session = build_opener(HTTPCookieProcessor(cookie_jar))

    # Use session.open() instead of urlopen()
    ...

# scripts/demo_latest.py
from urllib.request import build_opener, HTTPCookieProcessor
from http.cookiejar import CookieJar

def run_demo(...):
    # Create session once
    cookie_jar = CookieJar()
    session = build_opener(HTTPCookieProcessor(cookie_jar))

    # Reuse for both fetches
    es_prices = fetch_yahoo_data("ES=F", ..., session=session)
    spy_prices = fetch_yahoo_data("SPY", ..., session=session)

Acceptance Criteria

# Run demo, verify only 1 TCP connection established
python scripts/demo_latest.py --lookback 30

# Check logs for "Reusing session" message

Risk

MEDIUM — Affects network layer, but backwards compatible (session is optional)

PR #7: Vectorize V2 Signal Beta Calculation (LOW — P2)

Problem

compute_spread_and_zscore() uses manual loop: for i in range(window-1, len(es))
10-100x slower than pandas rolling operations on large datasets.

Files to Edit

src/signal.py (replace loop with pandas)
tests/test_signal.py (add performance test)

Changes

# Replace manual loop with pandas rolling
log_es = np.log(es)
log_spy = np.log(spy)

# Use pandas rolling covariance and variance
rolling_cov = log_es.rolling(window).cov(log_spy, ddof=0)
rolling_var = log_spy.rolling(window).var(ddof=0)

beta = rolling_cov / rolling_var
beta = beta.fillna(0.0)  # Handle zero variance

spread = log_es - beta * log_spy

Acceptance Criteria

# Add benchmark test
pytest tests/test_signal.py::test_v2_vectorized_matches_loop -v
pytest tests/test_signal.py::test_v2_performance -v

# Performance test: 10,000 bars must complete <1 sec

Risk

LOW — Pure optimization, numerical equivalence tested

PR #8: Add Smoke Test for demo_latest.py (LOW — P2)

Problem

No CI test verifies the full demo pipeline works end-to-end.
Can't detect regressions before deployment.

Files to Edit

tests/test_integration.py (new file)

Changes

"""Integration tests for full pipeline."""

import tempfile
from pathlib import Path
from unittest.mock import patch
import pandas as pd
import pytest

from scripts.demo_latest import run_demo

class TestDemoIntegration:
    def test_demo_latest_smoke(self):
        """Smoke test: demo runs without errors."""
        with tempfile.TemporaryDirectory() as tmpdir:
            # Mock data directory
            with patch("src.yahoo.fetch_yahoo_data") as mock_fetch:
                # Return fake data
                dates = pd.date_range("2024-01-01", periods=30, freq="D")
                mock_fetch.return_value = pd.Series(
                    range(100, 130), index=dates, name="TEST"
                )

                # Run demo
                run_demo(lookback_days=30, signal_version="v2", show_plot=False)

                # Verify results created
                assert len(list(Path("results").glob("demo_equity_*.csv"))) > 0
                assert len(list(Path("results").glob("demo_summary_*.csv"))) > 0

Acceptance Criteria

pytest tests/test_integration.py::TestDemoIntegration::test_demo_latest_smoke -v
# Must complete in <5 seconds (offline)

Risk

LOW — Pure test addition

PR #9: Unify Plotting API (LOW — P3)

Problem

Plotting API is confusing — plot_equity_curve(csv_path, out_path, title) but different callers have different needs.

Files to Edit

src/plotting.py (add deprecation)

Changes

import warnings

def plot_equity_curve_from_csv(csv_path: str, out_path: str, title: str) -> None:
    """Plot equity curve from CSV file (new explicit name)."""
    # Current implementation
    ...

def plot_equity_curve(
    equity: pd.Series | str,
    output_file: str,
    title: str
) -> None:
    """Plot equity curve from Series or CSV (unified interface)."""
    if isinstance(equity, str):
        # Delegate to CSV function
        plot_equity_curve_from_csv(equity, output_file, title)
    else:
        # Plot Series directly
        fig, ax = plt.subplots(figsize=(12, 6))
        ax.plot(equity.index, equity.values, linewidth=2, color="#2E86AB")
        # ... styling ...
        plt.savefig(output_file, dpi=150, bbox_inches="tight")
        plt.close(fig)

Acceptance Criteria

# Test both paths
from src.plotting import plot_equity_curve
import pandas as pd

# Path 1: Series
equity = pd.Series([0, 0.01, 0.02], index=pd.date_range('2024-01-01', periods=3))
plot_equity_curve(equity, "test_series.png", "Test")

# Path 2: CSV
plot_equity_curve("results/test_equity_curve_v2.csv", "test_csv.png", "Test")

# Both must create PNG files

Risk

LOW — Backwards compatible extension

PR #10: Add Signal Edge Case Tests (LOW — P3)

Problem

Signal functions lack edge case coverage:

Zero variance data
All-NaN inputs
Single-bar window
Misaligned timestamps

Files to Edit

tests/test_signal.py (add test class)

Changes

class TestSignalEdgeCases:
    """Edge case tests for signal generation."""

    def test_v1_zero_variance(self):
        """Test V1 with constant prices (zero variance)."""
        dates = pd.date_range("2024-01-01", periods=100, freq="D")
        es = pd.Series([100.0] * 100, index=dates)
        spy = pd.Series([50.0] * 100, index=dates)

        z = compute_zscore(es, spy, window=20)

        # All z-scores should be 0 (not NaN or inf)
        assert (z[19:] == 0.0).all()

    def test_v2_zero_variance(self):
        """Test V2 with constant prices."""
        # Similar test for V2
        ...

    def test_v1_single_datapoint_after_warmup(self):
        """Test V1 with minimal data (window + 1 points)."""
        dates = pd.date_range("2024-01-01", periods=21, freq="D")
        es = pd.Series(range(100, 121), index=dates)
        spy = pd.Series(range(50, 71), index=dates)

        z = compute_zscore(es, spy, window=20)

        # Only last point should be valid
        assert pd.isna(z[:19]).all()
        assert not pd.isna(z[19])

    def test_misaligned_timestamps(self):
        """Test automatic alignment when ES/SPY have different timestamps."""
        es_dates = pd.date_range("2024-01-01", periods=100, freq="D")
        spy_dates = pd.date_range("2024-01-02", periods=100, freq="D")  # 1 day offset

        es = pd.Series(range(100, 200), index=es_dates)
        spy = pd.Series(range(50, 150), index=spy_dates)

        z = compute_zscore(es, spy, window=20)

        # Should have 99 aligned bars (inner join)
        assert len(z) == 99

Acceptance Criteria

pytest tests/test_signal.py::TestSignalEdgeCases -v
# Must pass all 8 edge case tests

Risk

LOW — Pure test addition, validates existing behavior

Execution Order

Week 1: Critical Bugs & Dead Code

PR #1: Fix Plotting Crash (1 hour)
PR #2: Add Yahoo Integration Test (2 hours)
PR #3: Remove Dead Cache Code (1 hour)
PR #4: Add Cache Corruption Handling (1 hour)

Week 2: Performance & Quality 5. PR #5: Improve Crumb Mechanism (2 hours) 6. PR #6: Add Session Reuse (2 hours) 7. PR #7: Vectorize V2 Signal (2 hours) 8. PR #8: Add Smoke Test (1 hour)

Week 3: Polish 9. PR #9: Unify Plotting API (1 hour) 10. PR #10: Add Edge Case Tests (2 hours)

Agent Handoff Template

For each PR, provide agent with:

**Task:** [PR Title]

**Context:** See AI_CONTEXT.md for full project structure

**Files to Edit:**

- [list of files]

**Changes:**
[detailed changes from PR description]

**Acceptance Criteria:**
[bash commands from PR]

**Verification:**
After changes, run:

1. pytest tests/ -v
2. python scripts/demo_latest.py --lookback 30
3. Verify acceptance criteria pass

Success Metrics

✅ All demos run without crashes
✅ 100% offline testability
✅ Zero dead code in src/
✅ Sub-second V2 signal computation
✅ Robust error handling with logging
✅ Clean, professional codebase

FilesExpand file tree

PHASE3_PLAN.md

Latest commit

History

PHASE3_PLAN.md

File metadata and controls

Phase 3 Execution Plan — Code Cleanup & Quality

Execution Strategy

PR #1: Fix Plotting Crash (CRITICAL — P0) ✅ COMPLETE

Problem

Solution Implemented

Files Modified

Verification

Risk

PR #2: Add End-to-End Yahoo Integration Test (HIGH — P0) ✅ COMPLETE

Problem

Solution Implemented

Files Modified

Verification

Key Achievement

Risk

Changes

Acceptance Criteria

Risk

PR #3: Remove Dead Cache Code (MEDIUM — P1)

Problem

Files to Edit

Changes

Acceptance Criteria

Risk

PR #4: Add Persistent Cache Corruption Handling (MEDIUM — P1)

Problem

Files to Edit

Changes

Acceptance Criteria

Risk

PR #5: Improve Crumb Mechanism (MEDIUM — P1)

Problem

Files to Edit

Investigation First

Changes

Acceptance Criteria

Risk

PR #6: Add Session Reuse to Yahoo Fetcher (MEDIUM — P2)

Problem

Files to Edit

Changes

Acceptance Criteria

Risk

PR #7: Vectorize V2 Signal Beta Calculation (LOW — P2)

Problem

Files to Edit

Changes

Acceptance Criteria

Risk

PR #8: Add Smoke Test for demo_latest.py (LOW — P2)

Problem

Files to Edit

Changes

Acceptance Criteria

Risk

PR #9: Unify Plotting API (LOW — P3)

Problem

Files to Edit

Changes

Acceptance Criteria

Risk

PR #10: Add Signal Edge Case Tests (LOW — P3)

Problem

Files to Edit

Changes

Acceptance Criteria

Risk

Execution Order

Agent Handoff Template

Success Metrics