|
| 1 | +# Param Optimization - QAOA Dataset Generation & Evaluation |
| 2 | + |
| 3 | +This directory contains a complete pipeline for generating and evaluating QAOA parameter optimization datasets **with real Qiskit execution**. |
| 4 | + |
| 5 | +## 🎯 Objective |
| 6 | + |
| 7 | +Create a high-quality dataset linking: |
| 8 | +- **Graph characteristics** (size, density, clustering, etc.) |
| 9 | +- **QAOA parameters** (depth p, initial angles β/γ, optimizer) |
| 10 | +- **Measured performance** (final energy, approximation ratio, convergence) |
| 11 | + |
| 12 | +This dataset is then used to train ML models capable of **predicting the best QAOA parameters** for a new graph. |
| 13 | + |
| 14 | +## 📁 Files |
| 15 | + |
| 16 | +- `generate_dataset.py`: Dataset generation with Qiskit QAOA (338 lines) |
| 17 | +- `evaluate_dataset.py`: Statistical analysis and dataset validation |
| 18 | +- `train_baseline.py`: Baseline ML model training (RandomForest, XGBoost) |
| 19 | +- `requirements.txt`: Python dependencies |
| 20 | +- `setup.sh`: Automatic installation script |
| 21 | +- `SIMPLIFICATION.md`: Code improvement documentation |
| 22 | + |
| 23 | +## 🚀 Installation |
| 24 | + |
| 25 | +```bash |
| 26 | +# Option 1: Automatic installation |
| 27 | +cd tests/param_optimization |
| 28 | +bash setup.sh |
| 29 | + |
| 30 | +# Option 2: Manual installation |
| 31 | +python -m venv .venv |
| 32 | +source .venv/bin/activate # Linux/Mac |
| 33 | +pip install -r requirements.txt |
| 34 | +``` |
| 35 | + |
| 36 | +## 📊 Usage |
| 37 | + |
| 38 | +### 1. Generate a dataset with Qiskit QAOA |
| 39 | + |
| 40 | +```bash |
| 41 | +# Quick test dataset (2 minutes) |
| 42 | +python generate_dataset.py \ |
| 43 | + --n_graphs 10 \ |
| 44 | + --samples_per_graph 5 \ |
| 45 | + --out quick_test.parquet |
| 46 | + |
| 47 | +# Research dataset (recommended: 500-1000 graphs) |
| 48 | +python generate_dataset.py \ |
| 49 | + --n_graphs 500 \ |
| 50 | + --samples_per_graph 10 \ |
| 51 | + --n_min 6 --n_max 12 \ |
| 52 | + --density_min 0.3 --density_max 0.7 \ |
| 53 | + --p_choices 1,2,3 \ |
| 54 | + --graph_types erdos_renyi,barabasi_albert \ |
| 55 | + --out research_dataset.parquet \ |
| 56 | + --seed 42 |
| 57 | +``` |
| 58 | + |
| 59 | +**Note**: The script **always** uses real QAOA with Qiskit. Estimated time: ~3-5s per sample. |
| 60 | + |
| 61 | +**Key parameters:** |
| 62 | +- `--n_graphs`: Number of random graphs (recommended: 100-1000) |
| 63 | +- `--samples_per_graph`: QAOA configurations per graph (recommended: 5-20) |
| 64 | +- `--n_min/n_max`: Graph size range (6-12 recommended) |
| 65 | +- `--density_min/max`: Density variability (0.3-0.7 recommended) |
| 66 | +- `--p_choices`: QAOA depths to test (e.g., 1,2,3) |
| 67 | +- `--graph_types`: Graph types (erdos_renyi, barabasi_albert, watts_strogatz, regular) |
| 68 | +- `--seed`: Seed for reproducibility |
| 69 | + |
| 70 | +### 2. Evaluate dataset quality |
| 71 | + |
| 72 | +```bash |
| 73 | +python evaluate_dataset.py research_dataset.parquet |
| 74 | +``` |
| 75 | + |
| 76 | +Displays: |
| 77 | +- Descriptive statistics (sizes, distributions) |
| 78 | +- Metric variability (crucial for ML!) |
| 79 | +- Correlations |
| 80 | +- Improvement recommendations |
| 81 | + |
| 82 | +### 3. Test with ML baseline |
| 83 | + |
| 84 | +```bash |
| 85 | +python train_baseline.py dataset.parquet \ |
| 86 | + --target approximation_ratio \ |
| 87 | + --test_size 0.2 |
| 88 | +``` |
| 89 | + |
| 90 | +Trains RandomForest and XGBoost to predict QAOA performance. |
| 91 | +Metrics: RMSE, R², Spearman rank correlation. |
| 92 | + |
| 93 | +**Interpretation:** |
| 94 | +- R² > 0.5 → **Excellent signal**, very useful dataset |
| 95 | +- R² 0.2-0.5 → Moderate signal, improve variability |
| 96 | +- R² < 0.2 → Poor dataset, revise protocol |
| 97 | + |
| 98 | +## 📈 Improving the Dataset |
| 99 | + |
| 100 | +### For a better dataset: |
| 101 | + |
| 102 | +1. **Increase size**: Minimum 1000-5000+ experiments |
| 103 | +2. **Diversify graphs**: |
| 104 | + - Multiple sizes (n=4 to 16+) |
| 105 | + - Multiple densities (0.2 to 0.8) |
| 106 | + - Multiple types (ER, BA, WS, regular, planted) |
| 107 | +3. **Use real QAOA**: Qiskit for authentic measurements |
| 108 | +4. **Vary parameters**: Multiple p, optimizers, initial angles |
| 109 | +5. **Repeat with seeds**: Multiple runs per graph for robustness |
| 110 | + |
| 111 | +### Example production command: |
| 112 | + |
| 113 | +```bash |
| 114 | +python generate_dataset.py \ |
| 115 | + --n_graphs 500 \ |
| 116 | + --samples_per_graph 10 \ |
| 117 | + --n_min 4 --n_max 14 \ |
| 118 | + --density_min 0.2 --density_max 0.8 \ |
| 119 | + --p_choices 1,2,3,4 \ |
| 120 | + --graph_types erdos_renyi,barabasi_albert,watts_strogatz \ |
| 121 | + --out large_dataset.parquet \ |
| 122 | + --seed 42 |
| 123 | +``` |
| 124 | + |
| 125 | +(Estimated time: several hours with real QAOA) |
| 126 | + |
| 127 | +## 🔬 Experimental Protocol |
| 128 | + |
| 129 | +### Dataset structure (columns): |
| 130 | + |
| 131 | +**Identifiers:** |
| 132 | +- `graph_id`, `seed` |
| 133 | + |
| 134 | +**Graph features:** |
| 135 | +- `num_nodes`, `num_edges`, `density` |
| 136 | +- `degree_mean`, `degree_std` |
| 137 | +- `clustering_coeff`, `assortativity` |
| 138 | +- `graph_type` |
| 139 | + |
| 140 | +**QAOA parameters:** |
| 141 | +- `p` (depth) |
| 142 | +- `init_beta`, `init_gamma` (initial angles, JSON) |
| 143 | +- `optimizer` |
| 144 | + |
| 145 | +**Performance metrics:** |
| 146 | +- `final_energy` (Hamiltonian energy) |
| 147 | +- `cut_value` (cut size found) |
| 148 | +- `approximation_ratio` (quality vs optimal) |
| 149 | +- `success_prob` (quality proxy) |
| 150 | +- `iterations` (optimizer iteration count) |
| 151 | +- `converged` (binary convergence) |
| 152 | +- `runtime` (execution time) |
| 153 | +- `optimal_cut` (exact solution if computable) |
| 154 | + |
| 155 | +## 🧪 Next Steps |
| 156 | + |
| 157 | +1. **Integrate Vertex Cover** and other combinatorial problems |
| 158 | +2. **Warm-start angles** based on heuristics |
| 159 | +3. **Meta-learning**: Transfer parameters between similar graphs |
| 160 | +4. **Bandits**: Adaptive parameter sampling |
| 161 | +5. **Neural architecture**: GNN for graph encoding, MLP for parameter prediction |
| 162 | + |
| 163 | +## 📚 References |
| 164 | + |
| 165 | +- QAOA: Farhi & Goldstone (2014) |
| 166 | +- Dataset best practices: ML for Quantum Computing |
| 167 | +- Graph features: NetworkX documentation |
| 168 | + |
| 169 | + |
0 commit comments