Skip to content

Resolve review comments: Fix unit tests and cleanup artifacts (new)#33

Open
anurag-r20 wants to merge 34 commits intobernalde:ibm_resultsfrom
anurag-r20:ibm_results
Open

Resolve review comments: Fix unit tests and cleanup artifacts (new)#33
anurag-r20 wants to merge 34 commits intobernalde:ibm_resultsfrom
anurag-r20:ibm_results

Conversation

@anurag-r20
Copy link
Collaborator

@anurag-r20 anurag-r20 commented Nov 22, 2025

I have pushed updates to address all the review comments: I am creating a new PR because the previous PR was merged without the updates

  1. Unit Tests: I reverted test_stochastic_benchmark.py, test_stats_pandas.py, and test_training_pandas.py to their original logic (restoring the names library and mock setups) to ensure the tests run correctly against the codebase.

  2. Interpolate Test: I reverted the fixture to use the original manual parameters and added a reset_index fix to handle the MultiIndex output correctly while keeping the strict assertions.

  3. Formatting: Applied the requested whitespace/docstring formatting.

  4. Artifacts: I updated .gitignore to handle the specific example folders and removed the accidental JSON file.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@anurag-r20 anurag-r20 requested a review from bernalde November 23, 2025 00:48
@anurag-r20
Copy link
Collaborator Author

This new commit for .gitignore addresses your comment from the previous PR on whether it ignores the data for other examples as well.

- Expand IBM_QAOA patterns to all examples subdirectories
- Add general artifact patterns (plots, checkpoints, progress)
- Add data file patterns (pkl, npz, npy)
- Ignore accidentally created repo root directory
Add comprehensive test fixtures for IBM QAOA data processing:
- 4 real experimental JSON files with varied instance/depth configurations
- 4 synthetic edge case fixtures (multi-trial, missing fields, empty)
- README documenting schema, boundaries, and usage patterns

Fixtures support testing of:
- Single vs multi-trial bootstrap behavior
- IBM-specific filename parsing
- Missing field handling
- Empty/malformed data edge cases
Add 32 tests organized in 6 test classes covering:

Unit Tests:
- QAOAResult dataclass creation
- parse_qaoa_trial with various data formats
- load_qaoa_results with missing/malformed data
- convert_to_dataframe transformations
- group_name_fcn filename parsing
- prepare_stochastic_benchmark_data pickle I/O

Integration Tests:
- process_qaoa_data end-to-end pipeline
- GTMinEnergy injection for missing ground truth
- Single-trial bootstrap fabrication
- Interpolation fallback behavior
- Train/test split generation

Edge Cases:
- Missing trainer information
- Missing optimal parameters
- Empty trials list
- Multi-trial synthetic data

All tests use fixtures and proper mocking for multiprocessing.
Test coverage validates IBM-specific logic boundaries.
Implement 4 high-impact optimizations for ~1K file scale:

1. ProcessingConfig dataclass for centralized configuration:
   - persist_raw: gate pickle writes during ingestion
   - interpolate_diversity_threshold: diversity-based interpolation
   - fabricate_single_trial: control single-trial bootstrap
   - seed: reproducible train/test splits
   - log_progress_interval: configurable progress logging

2. Structured logging infrastructure:
   - Replace print statements with logging module
   - Add progress logging every N files
   - Proper INFO/WARNING levels for errors
   - Timestamps and levels for production observability

3. In-memory aggregation with conditional pickle persistence:
   - persist_raw=True: write pickles to exp_raw/ subdirectory
   - persist_raw=False: aggregate in memory, skip ingestion pickles
   - Generate temporary pickles only when needed for bootstrap
   - Expected 1-2s savings for 1K files when disabled

4. Diversity-based interpolation heuristic:
   - Replace row count (n_rows <= 5) with diversity metric
   - diversity = unique_instances × unique_depths
   - Skip interpolation when diversity < threshold
   - Prevents spurious skips on sparse but valid grids

Additional improvements:
- Add try/except for malformed JSON files with warnings
- Use config.seed for reproducible train/test splits
- Fix pickle paths to use exp_raw subdirectory convention
- Add enumeration to ingestion loop for progress tracking

All changes maintain backward compatibility with default config.
Expected performance improvement: ~15s for 1K files (from ~20-25s).
Add comprehensive performance optimization guide:

Phase 1 - Implemented (4 changes):
- ProcessingConfig dataclass for centralized configuration
- Structured logging infrastructure
- In-memory aggregation with persist_raw flag
- Diversity-based interpolation heuristic

Phase 2 - Deferred Enhancements (6 optimizations):
1. Parallel I/O with ThreadPoolExecutor (3-5x potential speedup)
2. Parquet output format (faster writes, smaller files)
3. orjson for JSON parsing (~2x speedup)
4. Lazy bootstrap fabrication (skip unnecessary computation)
5. Categorical dtypes for memory efficiency
6. Rich diversity metrics (entropy-based quality assessment)

Each enhancement documented with:
- Problem/solution description
- Expected impact and thresholds
- Implementation complexity
- Testing requirements

Target metrics:
- Phase 1: <15s for 1K files (from ~20-25s baseline)
- Phase 2: <10s with parallelization
- Scale guidance: When to apply each optimization
Clear execution outputs and intermediate results to reduce repo size.
Notebook structure and analysis code preserved.
Add ibm_qaoa_analysis_hardware.ipynb for analyzing real quantum hardware
results from IBM systems. Complements simulation analysis with hardware-
specific metrics and comparisons.
- `###` - Instance ID (e.g., 000, 001, 002)
- `N##R3R` - Problem size indicator
- `_#.json` - Depth parameter (p)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you verify if this is true @anurag-r20 ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except N is the problem size and R3R is the type of graph. p is not the depth

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, let's modify this file

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's split the N## line from the R#R lkine and explain all the alternatives (heavy-hex, Erdos-renyi, ...)

Move all pandas-specific test files from tests/Pandas_Group_Tests/ to tests/:
- test_interpolate_pandas.py
- test_stats_pandas.py
- test_stochastic_benchmark_pandas.py
- test_training_pandas.py

Remove empty Pandas_Group_Tests subdirectory for better test organization.
All 13 tests still passing after move.
- Update processing script to detect three optimization states: 'opt', 'noOpt', and None
- Add marker differentiation in plotting: circles for opt, x for noOpt, squares for no flag
- Use depth-specific colors for all marker types with appropriate legend labels
- Extract optimization flag from filename patterns (_opt_, _noOpt_, or neither)
- Fallback to Energy metric when Approximation Ratio not available in JSON
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be rerun without the Opt vs noOpt distinction as they are two separate solvers. In fact, have the solver_namemerge them

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be run from scratch

- `###` - Instance ID (e.g., 000, 001, 002)
- `N##R3R` - Problem size indicator
- `_#.json` - Depth parameter (p)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's split the N## line from the R#R lkine and explain all the alternatives (heavy-hex, Erdos-renyi, ...)

- Load minmax cuts from JSON files in R3R/minmax_cuts directory
- Add maxcut_approximation_ratio() function using formula:
  cut_val = energy + 0.5 * sum_weights
  approx_ratio = (cut_val - min_cut) / (max_cut - min_cut)
- Update convert_to_dataframe() to use calculated approximation ratios
- Update process_qaoa_data() to load and pass minmax data
- Add proper error handling and validation for edge cases
- Test minmax cuts loading from directory
- Test approximation ratio calculation
- Test end-to-end processing with minmax integration
- Verify non-NaN approximation ratios in output
…d comparison

- Update data loading to use 'optimized' column from process_qaoa_data
- Create separate method names (FA_opt, FA_noOpt, TQA_opt, TQA_noOpt)
- Update methods_to_compare list to include all 7 method variants
- Change legends to single-column layout for better readability
- Invalidate cache to force reprocessing with new method names
- All variants treated independently in statistical analysis and rankings
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be run from scratch

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses review comments from a previous merge by restoring correct unit test logic, fixing test fixtures, applying formatting cleanup, and removing accidental artifacts. The changes include comprehensive test coverage for new IBM QAOA processing functionality, major refactoring of the ibm_qaoa_processing.py module, and documentation improvements.

Changes:

  • Restored unit tests with proper names library usage and mocking for test_training_pandas.py, test_stochastic_benchmark_pandas.py, and test_stats_pandas.py
  • Added comprehensive test suite for IBM QAOA processing with 722 lines of tests and fixture files
  • Refactored ibm_qaoa_processing.py with major architectural improvements (579 lines, up from ~379)
  • Applied whitespace/formatting fixes to source files
  • Updated .gitignore to properly handle example artifacts
  • Added performance roadmap documentation and analysis notebook

Reviewed changes

Copilot reviewed 23 out of 27 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/test_training_pandas.py Restored names library imports and fixed distance function signature
tests/test_stochastic_benchmark_pandas.py New file with proper names library column naming
tests/test_stats_pandas.py New file with correct column naming using names library
tests/test_minmax_integration.py New integration test for minmax cuts functionality
tests/test_interpolate_pandas.py Reverted to manual parameters and added reset_index fix
tests/test_ibm_qaoa_processing.py Comprehensive 722-line test suite for QAOA processing
tests/fixtures/ibm_qaoa/* Multiple JSON fixture files and README documentation
src/training.py Fixed parameter name typo in docstring
src/plotting.py Whitespace cleanup
src/interpolate.py Whitespace cleanup
examples/IBM_QAOA/ibm_qaoa_processing.py Major refactoring with config dataclass, logging, and improved architecture
examples/IBM_QAOA/ibm_qaoa_analysis_hardware.ipynb New 1772-line analysis notebook
examples/IBM_QAOA/PERFORMANCE_ROADMAP.md Performance optimization documentation
.gitignore Updated to specifically target IBM_QAOA/exp_raw/

@anurag-r20
Copy link
Collaborator Author

@bernalde
I have archived old code and opened a PR for version 1 of new code. This does not involve plotting performance curves yet. I would like to open a new PR for that and close this one.

Copy link
Owner

@bernalde bernalde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to review the notebooks. Moreover, I will keep reviewing this PR until I agree with its content, but you previous comment suggest closing this? Please clarify.
My suggestion, work on this until we merge it and then we create other PRs

@@ -0,0 +1,479 @@
''' This file is for processing QAOA data files generated by IBM using QAOA.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a link to the repository at least

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some file drscription,. Why are there functions in util and not in the other files?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 38 changed files in this pull request and generated 3 comments.

Comment on lines 23 to +26
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {},
"outputs": [
{
"ename": "ModuleNotFoundError",
"evalue": "No module named 'maxcut_benchmark'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[2], line 2\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mos\u001b[39;00m\n\u001b[0;32m----> 2\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mconversion\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mws\u001b[39;00m\n\u001b[1;32m 3\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mitertools\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m product \u001b[38;5;28;01mas\u001b[39;00m iterprod\n",
"File \u001b[0;32m/mnt/c/Users/rames102/Desktop/stochastic-benchmark/examples/QEDC_to_WS_conversion/conversion.py:9\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mos\u001b[39;00m\n\u001b[1;32m 8\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mglob\u001b[39;00m\n\u001b[0;32m----> 9\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mmaxcut_benchmark\u001b[39;00m\n\u001b[1;32m 10\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mscipy\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mspecial\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m erfinv\n\u001b[1;32m 12\u001b[0m times_list \u001b[38;5;241m=\u001b[39m [\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124melapsed_time\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mexec_time\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mopt_exec_time\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcreate_time\u001b[39m\u001b[38;5;124m\"\u001b[39m]\n",
"\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'maxcut_benchmark'"
]
}
],
"outputs": [],
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notebook cell execution output has been cleared (changed from execution_count: 2 with error output to execution_count: null with empty outputs). While this is generally good practice for version control, the error trace that was removed indicated a ModuleNotFoundError: No module named 'maxcut_benchmark'. This suggests the notebook may have dependency issues that should be documented or fixed rather than hidden by clearing outputs.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +64
#!/usr/bin/env python
"""
Quick test script to verify minmax cuts integration works correctly.
"""
import sys
sys.path.append('../../src')
import ibm_qaoa_processing

# Test loading minmax cuts
print("=" * 60)
print("Testing minmax cuts loading...")
print("=" * 60)

minmax_data = ibm_qaoa_processing.load_minmax_cuts('R3R/minmax_cuts')
print(f"\nLoaded minmax data for {len(minmax_data)} instances")

# Show sample data
if minmax_data:
sample_id = list(minmax_data.keys())[0]
print(f"\nSample instance {sample_id}:")
print(f" min_cut: {minmax_data[sample_id]['min_cut']}")
print(f" max_cut: {minmax_data[sample_id]['max_cut']}")
print(f" sum_weights: {minmax_data[sample_id]['sum_weights']}")

# Test approximation ratio calculation
print(f"\n{'=' * 60}")
print("Testing approximation ratio calculation...")
print("=" * 60)

# Test with sample energy value
energy = -6.5 # Sample energy
ratio = ibm_qaoa_processing.maxcut_approximation_ratio(
energy,
minmax_data[sample_id]['min_cut'],
minmax_data[sample_id]['max_cut'],
minmax_data[sample_id]['sum_weights']
)
print(f"\nFor instance {sample_id} with energy {energy}:")
print(f" cut_val = {energy} + 0.5 * {minmax_data[sample_id]['sum_weights']} = {energy + 0.5 * minmax_data[sample_id]['sum_weights']}")
print(f" Approximation ratio = {ratio:.4f}")

print(f"\n{'=' * 60}")
print("Testing FA data processing with minmax cuts...")
print("=" * 60)

# Process a small subset of FA data to verify integration
sb, agg_df = ibm_qaoa_processing.process_qaoa_data(
json_pattern="R3R/FA/*_000N10R3R_*.json", # Just instance 000
output_dir="exp_raw"
)

print(f"\nProcessed {len(agg_df)} rows")
print(f"\nApproximation Ratio statistics:")
print(f" Non-NaN values: {agg_df['Approximation_Ratio'].notna().sum()}/{len(agg_df)}")
print(f" Mean: {agg_df['Approximation_Ratio'].mean():.4f}")
print(f" Min: {agg_df['Approximation_Ratio'].min():.4f}")
print(f" Max: {agg_df['Approximation_Ratio'].max():.4f}")

print("\nSample rows:")
print(agg_df[['instance', 'p', 'Energy', 'Approximation_Ratio', 'MeanTime', 'optimized']].head(10))

print(f"\n{'=' * 60}")
print("✓ Integration test completed successfully!")
print("=" * 60)
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test file appears to be a standalone script rather than a proper pytest test. It includes hardcoded paths (e.g., sys.path.append('../../src') and 'R3R/minmax_cuts') and direct execution logic without using pytest fixtures or test classes. This should be converted to proper pytest tests or moved to an examples/scripts directory.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +722
"""
Tests for IBM QAOA processing functions.

This test suite validates the IBM-specific QAOA data ingestion pipeline
in examples/IBM_QAOA/ibm_qaoa_processing.py.
"""

import json
import os
import sys
import tempfile
import shutil
from pathlib import Path
from typing import Dict, Any
import numpy as np
import pandas as pd
import pytest

# Add the IBM_QAOA directory to path for imports
IBM_QAOA_DIR = Path(__file__).parent.parent / "examples" / "IBM_QAOA"
sys.path.insert(0, str(IBM_QAOA_DIR))

# Add stochastic-benchmark src to path
SRC_DIR = Path(__file__).parent.parent / "src"
sys.path.insert(0, str(SRC_DIR))

import ibm_qaoa_processing as ibm


# Fixtures directory
FIXTURES_DIR = Path(__file__).parent / "fixtures" / "ibm_qaoa"


class TestQAOAResult:
"""Test QAOAResult dataclass."""

def test_qaoa_result_creation(self):
"""Test basic QAOAResult object creation."""
result = ibm.QAOAResult(
trial_id=0,
instance_id="001",
depth=2,
energy=-12.5,
approximation_ratio=0.85,
train_duration=2.3,
trainer_name="FixedAngleConjecture",
evaluator=None,
success=True,
optimized_params=[0.1, 0.2],
energy_history=[-10.0, -12.5]
)

assert result.trial_id == 0
assert result.instance_id == "001"
assert result.depth == 2
assert result.energy == -12.5
assert result.approximation_ratio == 0.85
assert result.trainer_name == "FixedAngleConjecture"
assert result.evaluator is None
assert result.success is True
assert len(result.optimized_params) == 2
assert len(result.energy_history) == 2


class TestParseQAOATrial:
"""Test parse_qaoa_trial function."""

def test_parse_complete_trial(self):
"""Test parsing a trial with all fields present."""
trial_data = {
'energy': -12.5,
'approximation ratio': 0.85,
'train_duration': 2.3,
'trainer': {
'trainer_name': 'FixedAngleConjecture',
'evaluator': None
},
'success': True,
'optimal_params': [0.1, 0.2],
'history': [-10.0, -11.5, -12.5]
}

result = ibm.parse_qaoa_trial(trial_data, trial_id=0, instance_id="001", depth=2)

assert result.trial_id == 0
assert result.instance_id == "001"
assert result.depth == 2
assert result.energy == -12.5
assert result.approximation_ratio == 0.85
assert result.train_duration == 2.3
assert result.trainer_name == 'FixedAngleConjecture'
assert result.evaluator is None
assert result.success is True
assert result.optimized_params == [0.1, 0.2]
assert result.energy_history == [-10.0, -11.5, -12.5]

def test_parse_missing_fields(self):
"""Test parsing with missing optional fields."""
trial_data = {
'energy': -5.0,
'approximation ratio': 0.5
}

result = ibm.parse_qaoa_trial(trial_data, trial_id=1, instance_id="002", depth=1)

assert result.energy == -5.0
assert result.approximation_ratio == 0.5
assert result.train_duration == 0.0 # Default
assert result.trainer_name == 'Unknown' # Default when missing
assert result.evaluator is None
assert result.success is False # Default
assert result.optimized_params == []
assert result.energy_history == []

def test_parse_missing_trainer_dict(self):
"""Test parsing when trainer field is missing."""
trial_data = {
'energy': -8.0,
'approximation ratio': 0.7,
'train_duration': 1.5,
'success': True
}

result = ibm.parse_qaoa_trial(trial_data, trial_id=2, instance_id="003", depth=1)

assert result.trainer_name == 'Unknown'
assert result.evaluator is None

def test_parse_trainer_as_string(self):
"""Test parsing when trainer is a string instead of dict."""
trial_data = {
'energy': -9.0,
'approximation ratio': 0.75,
'trainer': 'COBYLA'
}

result = ibm.parse_qaoa_trial(trial_data, trial_id=3, instance_id="004", depth=2)

assert result.trainer_name == 'COBYLA'
assert result.evaluator is None

def test_parse_nan_values(self):
"""Test that NaN values are properly handled."""
trial_data = {}

result = ibm.parse_qaoa_trial(trial_data, trial_id=4, instance_id="005", depth=1)

assert np.isnan(result.energy)
assert np.isnan(result.approximation_ratio)


class TestLoadQAOAResults:
"""Test load_qaoa_results function."""

def test_load_single_trial(self):
"""Test loading JSON with single trial."""
json_data = {
"0": {
'energy': -12.5,
'approximation ratio': 0.85,
'train_duration': 2.3
}
}

results = ibm.load_qaoa_results(json_data)

assert len(results) == 1
assert results[0].trial_id == 0
assert results[0].energy == -12.5

def test_load_multiple_trials(self):
"""Test loading JSON with multiple trials."""
json_data = {
"0": {'energy': -12.5, 'approximation ratio': 0.85},
"1": {'energy': -13.0, 'approximation ratio': 0.87},
"2": {'energy': -12.8, 'approximation ratio': 0.86}
}

results = ibm.load_qaoa_results(json_data)

assert len(results) == 3
assert results[0].trial_id == 0
assert results[1].trial_id == 1
assert results[2].trial_id == 2

def test_load_ignores_non_numeric_keys(self):
"""Test that non-numeric keys are ignored."""
json_data = {
"0": {'energy': -12.5, 'approximation ratio': 0.85},
"1": {'energy': -13.0, 'approximation ratio': 0.87},
"metadata": {'note': 'test data'},
"config": {'version': '1.0'}
}

results = ibm.load_qaoa_results(json_data)

assert len(results) == 2
assert all(isinstance(r.trial_id, int) for r in results)

def test_load_empty_dict(self):
"""Test loading empty dictionary."""
json_data = {}

results = ibm.load_qaoa_results(json_data)

assert len(results) == 0

def test_load_from_fixture(self):
"""Test loading from actual fixture file."""
fixture_path = FIXTURES_DIR / "multi_trial_synthetic.json"
with open(fixture_path, 'r') as f:
json_data = json.load(f)

results = ibm.load_qaoa_results(json_data)

assert len(results) == 3
assert all(r.trainer_name == 'FixedAngleConjecture' for r in results)


class TestConvertToDataFrame:
"""Test convert_to_dataframe function."""

def test_convert_single_result(self):
"""Test converting single QAOAResult to DataFrame."""
result = ibm.QAOAResult(
trial_id=0,
instance_id="001",
depth=2,
energy=-12.5,
approximation_ratio=0.85,
train_duration=2.3,
trainer_name="FixedAngleConjecture",
evaluator=None,
success=True,
optimized_params=[0.1, 0.2],
energy_history=[-10.0, -12.5]
)

df = ibm.convert_to_dataframe([result], instance_id="001", p=2)

assert len(df) == 1
assert df['trial_id'].iloc[0] == 0
assert df['instance'].iloc[0] == "001"
assert df['p'].iloc[0] == 2
assert df['Energy'].iloc[0] == -12.5
assert df['Approximation_Ratio'].iloc[0] == 0.85
assert df['MeanTime'].iloc[0] == 2.3
assert df['trainer'].iloc[0] == "FixedAngleConjecture"
assert pd.isna(df['evaluator'].iloc[0])
assert df['success'].iloc[0] == True
assert df['n_iterations'].iloc[0] == 2
assert 'param_0' in df.columns
assert 'param_1' in df.columns
assert df['param_0'].iloc[0] == 0.1
assert df['param_1'].iloc[0] == 0.2

def test_convert_multiple_results(self):
"""Test converting multiple results."""
results = [
ibm.QAOAResult(0, "001", 1, -12.5, 0.85, 2.3, "FA", None, True, [0.1], [-12.5]),
ibm.QAOAResult(1, "001", 1, -13.0, 0.87, 2.5, "FA", None, True, [0.12], [-13.0]),
ibm.QAOAResult(2, "001", 1, -12.8, 0.86, 2.4, "FA", None, True, [0.11], [-12.8])
]

df = ibm.convert_to_dataframe(results, instance_id="001", p=1)

assert len(df) == 3
assert df['instance'].iloc[0] == "001"
assert df['p'].iloc[0] == 1
assert list(df['Energy']) == [-12.5, -13.0, -12.8]

def test_convert_no_params(self):
"""Test converting result without optimized parameters."""
result = ibm.QAOAResult(
trial_id=0,
instance_id="002",
depth=1,
energy=-8.0,
approximation_ratio=0.7,
train_duration=1.5,
trainer_name="COBYLA",
evaluator=None,
success=True,
optimized_params=[],
energy_history=[-8.0]
)

df = ibm.convert_to_dataframe([result], instance_id="002", p=1)

assert len(df) == 1
assert 'param_0' not in df.columns
assert df['n_iterations'].iloc[0] == 1

def test_convert_empty_history(self):
"""Test converting result with empty history."""
result = ibm.QAOAResult(
0, "003", 1, -5.0, 0.5, 1.0, "Unknown", None, False, [], []
)

df = ibm.convert_to_dataframe([result], instance_id="003", p=1)

assert df['n_iterations'].iloc[0] == 0


class TestGroupNameFunction:
"""Test group_name_fcn function."""

def test_extract_standard_filename(self):
"""Test extracting group name from standard pickle filename."""
filename = "raw_results_inst=001_depth=2.pkl"

group = ibm.group_name_fcn(filename)

assert group == "inst=001_depth=2"

def test_extract_with_path(self):
"""Test extracting from full path."""
filepath = "/path/to/raw_results_inst=123_depth=4.pkl"

group = ibm.group_name_fcn(filepath)

assert group == "inst=123_depth=4"

def test_malformed_filename_returns_full(self):
"""Test that malformed filename returns full basename."""
filename = "malformed_file.pkl"

group = ibm.group_name_fcn(filename)

assert group == "malformed_file.pkl"


class TestPrepareStochasticBenchmarkData:
"""Test prepare_stochastic_benchmark_data function."""

def test_save_and_load_pickle(self):
"""Test saving DataFrame to pickle format."""
df = pd.DataFrame({
'trial_id': [0, 1],
'instance': ['001', '001'],
'p': [2, 2],
'Energy': [-12.5, -13.0],
'Approximation_Ratio': [0.85, 0.87],
'MeanTime': [2.3, 2.5]
})

with tempfile.TemporaryDirectory() as tmpdir:
filepath = ibm.prepare_stochastic_benchmark_data(df, "001", 2, tmpdir)

assert os.path.exists(filepath)
assert filepath.endswith("raw_results_inst=001_depth=2.pkl")
# Verify file is in exp_raw subdirectory (StochasticBenchmark convention)
assert "exp_raw" in filepath

# Verify we can load it back
loaded_df = pd.read_pickle(filepath)
assert len(loaded_df) == 2
assert list(loaded_df.columns) == list(df.columns)
pd.testing.assert_frame_equal(loaded_df, df)

def test_creates_output_directory(self):
"""Test that output directory is created if it doesn't exist."""
with tempfile.TemporaryDirectory() as tmpdir:
output_dir = os.path.join(tmpdir, "nested", "output")
df = pd.DataFrame({'col': [1, 2]})

filepath = ibm.prepare_stochastic_benchmark_data(df, "001", 1, output_dir)

# Should create both output_dir and output_dir/exp_raw
assert os.path.exists(output_dir)
assert os.path.exists(os.path.join(output_dir, "exp_raw"))
assert os.path.exists(filepath)


class TestProcessQAOADataIntegration:
"""Integration tests for process_qaoa_data function."""

def test_process_single_file(self):
"""Test processing a single JSON file."""
with tempfile.TemporaryDirectory() as tmpdir:
# Copy fixture to temp directory with expected naming
fixture = FIXTURES_DIR / "20250901_165018_000N10R3R_MC_FA_SV_noOpt_2.json"
test_file = os.path.join(tmpdir, "20250901_165018_000N10R3R_MC_FA_SV_noOpt_2.json")
shutil.copy(fixture, test_file)

pattern = os.path.join(tmpdir, "*.json")
output_dir = os.path.join(tmpdir, "output")

sb, agg_df = ibm.process_qaoa_data(json_pattern=pattern, output_dir=output_dir)

assert sb is not None
assert len(agg_df) == 1 # Single trial
assert 'instance' in agg_df.columns
assert 'p' in agg_df.columns
assert 'Energy' in agg_df.columns
assert 'Approximation_Ratio' in agg_df.columns
assert agg_df['instance'].iloc[0] == 0 # Parsed from "000N10R3R"
assert agg_df['p'].iloc[0] == 2 # Parsed from "_2.json"

def test_process_multiple_files(self):
"""Test processing multiple JSON files."""
with tempfile.TemporaryDirectory() as tmpdir:
# Copy multiple fixtures
fixtures = [
"20250901_165018_000N10R3R_MC_FA_SV_noOpt_2.json",
"20250901_165018_000N10R3R_MC_FA_SV_noOpt_4.json",
"20250913_170712_001N10R3R_MC_FA_SV_noOpt_1.json"
]

for fixture_name in fixtures:
fixture = FIXTURES_DIR / fixture_name
shutil.copy(fixture, tmpdir)

pattern = os.path.join(tmpdir, "*.json")
output_dir = os.path.join(tmpdir, "output")

sb, agg_df = ibm.process_qaoa_data(json_pattern=pattern, output_dir=output_dir)

assert sb is not None
assert len(agg_df) == 3 # One trial per file
assert agg_df['instance'].nunique() == 2 # instances 0 and 1
assert set(agg_df['p'].unique()) == {1, 2, 4} # depths 1, 2, 4

# Check sorting
assert list(agg_df['instance']) == sorted(agg_df['instance'])

def test_process_adds_gtminenergy(self):
"""Test that GTMinEnergy is added to DataFrames."""
with tempfile.TemporaryDirectory() as tmpdir:
fixture = FIXTURES_DIR / "20250913_170712_001N10R3R_MC_FA_SV_noOpt_1.json"
test_file = os.path.join(tmpdir, "test.json")
shutil.copy(fixture, test_file)

pattern = os.path.join(tmpdir, "*.json")
output_dir = os.path.join(tmpdir, "output")

sb, agg_df = ibm.process_qaoa_data(json_pattern=pattern, output_dir=output_dir)

assert 'GTMinEnergy' in agg_df.columns
# GTMinEnergy should equal the minimum Energy in single-trial case
assert agg_df['GTMinEnergy'].iloc[0] == agg_df['Energy'].iloc[0]

def test_process_empty_pattern(self):
"""Test processing with no matching files."""
with tempfile.TemporaryDirectory() as tmpdir:
pattern = os.path.join(tmpdir, "*.json")
output_dir = os.path.join(tmpdir, "output")

sb, agg_df = ibm.process_qaoa_data(json_pattern=pattern, output_dir=output_dir)

assert sb is None
assert len(agg_df) == 0

def test_process_creates_pickles(self):
"""Test that pickle files are created for each input JSON."""
with tempfile.TemporaryDirectory() as tmpdir:
fixture = FIXTURES_DIR / "20250901_165018_000N10R3R_MC_FA_SV_noOpt_2.json"
test_file = os.path.join(tmpdir, "20250901_165018_000N10R3R_MC_FA_SV_noOpt_2.json")
shutil.copy(fixture, test_file)

pattern = os.path.join(tmpdir, "*.json")
output_dir = os.path.join(tmpdir, "output")

# Use config with persist_raw=True to ensure pickles are written
config = ibm.ProcessingConfig(persist_raw=True)
sb, agg_df = ibm.process_qaoa_data(json_pattern=pattern, output_dir=output_dir, config=config)

# Check that pickle file was created in output_dir/exp_raw (StochasticBenchmark convention)
raw_data_dir = os.path.join(output_dir, "exp_raw")
assert os.path.exists(raw_data_dir)
pickle_files = [f for f in os.listdir(raw_data_dir) if f.endswith('.pkl')]
assert len(pickle_files) == 1
expected_pickle = os.path.join(raw_data_dir, pickle_files[0])
assert os.path.exists(expected_pickle)

# Verify content
df = pd.read_pickle(expected_pickle)
assert len(df) == 1
# Instance ID is extracted as '000' from filename, not converted to int
assert df['instance'].iloc[0] == '000'
assert df['p'].iloc[0] == 2

def test_process_single_trial_bootstrap_fabrication(self):
"""Test that single-trial files get manual bootstrap fabrication."""
with tempfile.TemporaryDirectory() as tmpdir:
fixture = FIXTURES_DIR / "20250913_171721_002N10R3R_MC_FA_SV_noOpt_1.json"
test_file = os.path.join(tmpdir, "20250913_171721_002N10R3R_MC_FA_SV_noOpt_1.json")
shutil.copy(fixture, test_file)

pattern = os.path.join(tmpdir, "*.json")
output_dir = os.path.join(tmpdir, "output")

sb, agg_df = ibm.process_qaoa_data(json_pattern=pattern, output_dir=output_dir)

assert sb is not None
assert sb.bs_results is not None

# Single trial should have 5 bootstrap entries (boots 10, 20, 30, 40, 50)
assert len(sb.bs_results) == 5
assert set(sb.bs_results['boots'].unique()) == {10, 20, 30, 40, 50}

# Check confidence intervals are zero-width for single trial
if 'Key=PerfRatio' in sb.bs_results.columns:
first_row = sb.bs_results.iloc[0]
assert first_row['Key=PerfRatio'] == first_row['ConfInt=lower_Key=PerfRatio']
assert first_row['Key=PerfRatio'] == first_row['ConfInt=upper_Key=PerfRatio']

def test_process_interpolation_fallback(self):
"""Test that interpolation falls back for single instance."""
with tempfile.TemporaryDirectory() as tmpdir:
fixture = FIXTURES_DIR / "20250913_171721_002N10R3R_MC_FA_SV_noOpt_1.json"
test_file = os.path.join(tmpdir, "test.json")
shutil.copy(fixture, test_file)

pattern = os.path.join(tmpdir, "*.json")
output_dir = os.path.join(tmpdir, "output")

sb, agg_df = ibm.process_qaoa_data(json_pattern=pattern, output_dir=output_dir)

assert sb.interp_results is not None
# For single instance (5 rows from bootstrap), interp_results should equal bs_results
assert len(sb.interp_results) == 5
assert 'resource' in sb.interp_results.columns

def test_process_adds_train_test_split(self):
"""Test that train/test split column is added."""
with tempfile.TemporaryDirectory() as tmpdir:
fixture = FIXTURES_DIR / "20250913_170712_001N10R3R_MC_FA_SV_noOpt_1.json"
test_file = os.path.join(tmpdir, "test.json")
shutil.copy(fixture, test_file)

pattern = os.path.join(tmpdir, "*.json")
output_dir = os.path.join(tmpdir, "output")

sb, agg_df = ibm.process_qaoa_data(json_pattern=pattern, output_dir=output_dir)

assert 'train' in sb.interp_results.columns
assert set(sb.interp_results['train'].unique()).issubset({0, 1})


class TestEdgeCases:
"""Test edge cases and error handling."""

def test_missing_trainer_fixture(self):
"""Test processing file with missing trainer field."""
with tempfile.TemporaryDirectory() as tmpdir:
fixture = FIXTURES_DIR / "missing_trainer.json"
test_file = os.path.join(tmpdir, "20250901_165018_000N10R3R_MC_FA_SV_noOpt_1.json")
shutil.copy(fixture, test_file)

pattern = os.path.join(tmpdir, "*.json")
output_dir = os.path.join(tmpdir, "output")

sb, agg_df = ibm.process_qaoa_data(json_pattern=pattern, output_dir=output_dir)

assert len(agg_df) == 1
assert agg_df['trainer'].iloc[0] == 'Unknown'
assert pd.isna(agg_df['evaluator'].iloc[0])

def test_missing_optimal_params_fixture(self):
"""Test processing file with missing optimal_params field."""
with tempfile.TemporaryDirectory() as tmpdir:
fixture = FIXTURES_DIR / "missing_optimal_params.json"
test_file = os.path.join(tmpdir, "20250901_165018_000N10R3R_MC_FA_SV_noOpt_1.json")
shutil.copy(fixture, test_file)

pattern = os.path.join(tmpdir, "*.json")
output_dir = os.path.join(tmpdir, "output")

sb, agg_df = ibm.process_qaoa_data(json_pattern=pattern, output_dir=output_dir)

assert len(agg_df) == 1
# No param columns should exist
param_cols = [c for c in agg_df.columns if c.startswith('param_')]
assert len(param_cols) == 0

def test_empty_trials_fixture(self):
"""Test processing file with no trials."""
with tempfile.TemporaryDirectory() as tmpdir:
fixture = FIXTURES_DIR / "empty_trials.json"
test_file = os.path.join(tmpdir, "20250901_165018_000N10R3R_MC_FA_SV_noOpt_1.json")
shutil.copy(fixture, test_file)

pattern = os.path.join(tmpdir, "*.json")
output_dir = os.path.join(tmpdir, "output")

sb, agg_df = ibm.process_qaoa_data(json_pattern=pattern, output_dir=output_dir)

# Should process but result in empty data
assert sb is None
assert len(agg_df) == 0

def test_multi_trial_synthetic_fixture(self):
"""Test processing synthetic multi-trial file."""
with tempfile.TemporaryDirectory() as tmpdir:
fixture = FIXTURES_DIR / "multi_trial_synthetic.json"
test_file = os.path.join(tmpdir, "20250901_165018_000N10R3R_MC_FA_SV_noOpt_2.json")
shutil.copy(fixture, test_file)

pattern = os.path.join(tmpdir, "*.json")
output_dir = os.path.join(tmpdir, "output")

sb, agg_df = ibm.process_qaoa_data(json_pattern=pattern, output_dir=output_dir)

# Should have 3 trials from the multi-trial fixture
assert len(agg_df) == 3
assert agg_df['instance'].iloc[0] == 0
assert agg_df['p'].iloc[0] == 2

# Pickle should contain all 3 trials (in exp_raw subdirectory)
raw_data_dir = os.path.join(output_dir, "exp_raw")
pickle_files = [f for f in os.listdir(raw_data_dir) if f.endswith('.pkl')]
assert len(pickle_files) == 1
df = pd.read_pickle(os.path.join(raw_data_dir, pickle_files[0]))
assert len(df) == 3

# With 3 trials, should attempt standard bootstrap (may succeed or fail)
# Key is that interp_results should exist and have resource column
assert sb.interp_results is not None
if len(sb.interp_results) > 0:
assert 'resource' in sb.interp_results.columns


class TestProcessingConfig:
"""Test ProcessingConfig dataclass and config-driven behavior."""

def test_config_defaults(self):
"""Test ProcessingConfig default values."""
config = ibm.ProcessingConfig()
assert config.persist_raw is True
assert config.interpolate_diversity_threshold == 6
assert config.fabricate_single_trial is True
assert config.seed == 42
assert config.log_progress_interval == 50

def test_config_overrides(self):
"""Test ProcessingConfig with custom values."""
config = ibm.ProcessingConfig(
persist_raw=False,
interpolate_diversity_threshold=10,
fabricate_single_trial=False,
seed=123,
log_progress_interval=10
)
assert config.persist_raw is False
assert config.interpolate_diversity_threshold == 10
assert config.fabricate_single_trial is False
assert config.seed == 123
assert config.log_progress_interval == 10

def test_persist_raw_false_no_pickles(self):
"""Test that persist_raw=False prevents pickle creation during ingestion."""
with tempfile.TemporaryDirectory() as tmpdir:
fixture = FIXTURES_DIR / "20250901_165018_000N10R3R_MC_FA_SV_noOpt_2.json"
test_file = os.path.join(tmpdir, "20250901_165018_000N10R3R_MC_FA_SV_noOpt_2.json")
shutil.copy(fixture, test_file)

pattern = os.path.join(tmpdir, "*.json")
output_dir = os.path.join(tmpdir, "output")

config = ibm.ProcessingConfig(persist_raw=False)
sb, agg_df = ibm.process_qaoa_data(json_pattern=pattern, output_dir=output_dir, config=config)

# With persist_raw=False, bootstrap will generate temp pickles
# but they won't be from the original ingestion phase
# We can verify the aggregated DataFrame exists
assert len(agg_df) == 1
# Instance may be either string '000' or int 0 depending on conversion
assert agg_df['instance'].iloc[0] in ('000', 0)

# Bootstrap should still run and produce results
assert sb.interp_results is not None

def test_diversity_heuristic_skip(self):
"""Test that low diversity skips interpolation."""
with tempfile.TemporaryDirectory() as tmpdir:
# Single instance, single depth = diversity 1
fixture = FIXTURES_DIR / "20250901_165018_000N10R3R_MC_FA_SV_noOpt_2.json"
test_file = os.path.join(tmpdir, "20250901_165018_000N10R3R_MC_FA_SV_noOpt_2.json")
shutil.copy(fixture, test_file)

pattern = os.path.join(tmpdir, "*.json")
output_dir = os.path.join(tmpdir, "output")

config = ibm.ProcessingConfig(interpolate_diversity_threshold=6)
sb, agg_df = ibm.process_qaoa_data(json_pattern=pattern, output_dir=output_dir, config=config)

# Diversity (1 instance × 1 depth = 1) < threshold (6) → skip interpolation
# interp_results should equal bs_results (with resource column added)
assert sb.interp_results is not None
if len(sb.interp_results) > 0:
assert 'resource' in sb.interp_results.columns

def test_diversity_heuristic_run(self):
"""Test that sufficient diversity enables interpolation."""
with tempfile.TemporaryDirectory() as tmpdir:
# Copy 2 different instances with different depths
fixtures = [
"20250901_165018_000N10R3R_MC_FA_SV_noOpt_2.json", # instance 000, depth 2
"20250901_165018_000N10R3R_MC_FA_SV_noOpt_4.json" # instance 000, depth 4 (different depth, same instance)
]

for fixture_name in fixtures:
fixture = FIXTURES_DIR / fixture_name
test_file = os.path.join(tmpdir, fixture_name)
shutil.copy(fixture, test_file)

pattern = os.path.join(tmpdir, "*.json")
output_dir = os.path.join(tmpdir, "output")

config = ibm.ProcessingConfig(interpolate_diversity_threshold=3)
sb, agg_df = ibm.process_qaoa_data(json_pattern=pattern, output_dir=output_dir, config=config)

# Diversity (2 instances × 2 depths = 4) >= threshold (3) → run interpolation
assert sb.interp_results is not None
if len(sb.interp_results) > 0:
assert 'resource' in sb.interp_results.columns


if __name__ == "__main__":

pytest.main([__file__, "-v", "--tb=short"])
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test file test_ibm_qaoa_processing.py is 722 lines long and tests code in the examples directory (examples/IBM_QAOA/ibm_qaoa_processing.py). However, the main file being tested (ibm_qaoa_processing.py) appears to have been deleted in this PR (the diff shows removal of 379 lines with no replacement). This creates a situation where tests exist for code that no longer exists in the main location, suggesting the tests or the code organization needs correction.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments