Skip to content

Commit c22f7c0

Browse files
Merge pull request #2 from PoCInnovation/test/qiskit_dataset
test(qiskit dataset param optimization): dataset generator
2 parents 06c36b2 + 9e1551d commit c22f7c0

4 files changed

Lines changed: 522 additions & 0 deletions

File tree

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
.venv/
2+
__pycache__/
3+
4+
*.parquet

tests/param_optimization/README.md

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
# Param Optimization - QAOA Dataset Generation & Evaluation
2+
3+
This directory contains a complete pipeline for generating and evaluating QAOA parameter optimization datasets **with real Qiskit execution**.
4+
5+
## 🎯 Objective
6+
7+
Create a high-quality dataset linking:
8+
- **Graph characteristics** (size, density, clustering, etc.)
9+
- **QAOA parameters** (depth p, initial angles β/γ, optimizer)
10+
- **Measured performance** (final energy, approximation ratio, convergence)
11+
12+
This dataset is then used to train ML models capable of **predicting the best QAOA parameters** for a new graph.
13+
14+
## 📁 Files
15+
16+
- `generate_dataset.py`: Dataset generation with Qiskit QAOA (338 lines)
17+
- `evaluate_dataset.py`: Statistical analysis and dataset validation
18+
- `train_baseline.py`: Baseline ML model training (RandomForest, XGBoost)
19+
- `requirements.txt`: Python dependencies
20+
- `setup.sh`: Automatic installation script
21+
- `SIMPLIFICATION.md`: Code improvement documentation
22+
23+
## 🚀 Installation
24+
25+
```bash
26+
# Option 1: Automatic installation
27+
cd tests/param_optimization
28+
bash setup.sh
29+
30+
# Option 2: Manual installation
31+
python -m venv .venv
32+
source .venv/bin/activate # Linux/Mac
33+
pip install -r requirements.txt
34+
```
35+
36+
## 📊 Usage
37+
38+
### 1. Generate a dataset with Qiskit QAOA
39+
40+
```bash
41+
# Quick test dataset (2 minutes)
42+
python generate_dataset.py \
43+
--n_graphs 10 \
44+
--samples_per_graph 5 \
45+
--out quick_test.parquet
46+
47+
# Research dataset (recommended: 500-1000 graphs)
48+
python generate_dataset.py \
49+
--n_graphs 500 \
50+
--samples_per_graph 10 \
51+
--n_min 6 --n_max 12 \
52+
--density_min 0.3 --density_max 0.7 \
53+
--p_choices 1,2,3 \
54+
--graph_types erdos_renyi,barabasi_albert \
55+
--out research_dataset.parquet \
56+
--seed 42
57+
```
58+
59+
**Note**: The script **always** uses real QAOA with Qiskit. Estimated time: ~3-5s per sample.
60+
61+
**Key parameters:**
62+
- `--n_graphs`: Number of random graphs (recommended: 100-1000)
63+
- `--samples_per_graph`: QAOA configurations per graph (recommended: 5-20)
64+
- `--n_min/n_max`: Graph size range (6-12 recommended)
65+
- `--density_min/max`: Density variability (0.3-0.7 recommended)
66+
- `--p_choices`: QAOA depths to test (e.g., 1,2,3)
67+
- `--graph_types`: Graph types (erdos_renyi, barabasi_albert, watts_strogatz, regular)
68+
- `--seed`: Seed for reproducibility
69+
70+
### 2. Evaluate dataset quality
71+
72+
```bash
73+
python evaluate_dataset.py research_dataset.parquet
74+
```
75+
76+
Displays:
77+
- Descriptive statistics (sizes, distributions)
78+
- Metric variability (crucial for ML!)
79+
- Correlations
80+
- Improvement recommendations
81+
82+
### 3. Test with ML baseline
83+
84+
```bash
85+
python train_baseline.py dataset.parquet \
86+
--target approximation_ratio \
87+
--test_size 0.2
88+
```
89+
90+
Trains RandomForest and XGBoost to predict QAOA performance.
91+
Metrics: RMSE, R², Spearman rank correlation.
92+
93+
**Interpretation:**
94+
- R² > 0.5 → **Excellent signal**, very useful dataset
95+
- R² 0.2-0.5 → Moderate signal, improve variability
96+
- R² < 0.2 → Poor dataset, revise protocol
97+
98+
## 📈 Improving the Dataset
99+
100+
### For a better dataset:
101+
102+
1. **Increase size**: Minimum 1000-5000+ experiments
103+
2. **Diversify graphs**:
104+
- Multiple sizes (n=4 to 16+)
105+
- Multiple densities (0.2 to 0.8)
106+
- Multiple types (ER, BA, WS, regular, planted)
107+
3. **Use real QAOA**: Qiskit for authentic measurements
108+
4. **Vary parameters**: Multiple p, optimizers, initial angles
109+
5. **Repeat with seeds**: Multiple runs per graph for robustness
110+
111+
### Example production command:
112+
113+
```bash
114+
python generate_dataset.py \
115+
--n_graphs 500 \
116+
--samples_per_graph 10 \
117+
--n_min 4 --n_max 14 \
118+
--density_min 0.2 --density_max 0.8 \
119+
--p_choices 1,2,3,4 \
120+
--graph_types erdos_renyi,barabasi_albert,watts_strogatz \
121+
--out large_dataset.parquet \
122+
--seed 42
123+
```
124+
125+
(Estimated time: several hours with real QAOA)
126+
127+
## 🔬 Experimental Protocol
128+
129+
### Dataset structure (columns):
130+
131+
**Identifiers:**
132+
- `graph_id`, `seed`
133+
134+
**Graph features:**
135+
- `num_nodes`, `num_edges`, `density`
136+
- `degree_mean`, `degree_std`
137+
- `clustering_coeff`, `assortativity`
138+
- `graph_type`
139+
140+
**QAOA parameters:**
141+
- `p` (depth)
142+
- `init_beta`, `init_gamma` (initial angles, JSON)
143+
- `optimizer`
144+
145+
**Performance metrics:**
146+
- `final_energy` (Hamiltonian energy)
147+
- `cut_value` (cut size found)
148+
- `approximation_ratio` (quality vs optimal)
149+
- `success_prob` (quality proxy)
150+
- `iterations` (optimizer iteration count)
151+
- `converged` (binary convergence)
152+
- `runtime` (execution time)
153+
- `optimal_cut` (exact solution if computable)
154+
155+
## 🧪 Next Steps
156+
157+
1. **Integrate Vertex Cover** and other combinatorial problems
158+
2. **Warm-start angles** based on heuristics
159+
3. **Meta-learning**: Transfer parameters between similar graphs
160+
4. **Bandits**: Adaptive parameter sampling
161+
5. **Neural architecture**: GNN for graph encoding, MLP for parameter prediction
162+
163+
## 📚 References
164+
165+
- QAOA: Farhi & Goldstone (2014)
166+
- Dataset best practices: ML for Quantum Computing
167+
- Graph features: NetworkX documentation
168+
169+

0 commit comments

Comments
 (0)