Native Rust hyperparameter optimization — Optuna-style without reloading data

## Context

Issue #77 proposed Optuna in Python (gpredomicspy). But Python-level optimization reloads data from disk for every trial — wasteful when the dataset is large (e.g., wetlab 1981×918 matrix).

A native Rust implementation would:
1. Load data once
2. Run hundreds of parameter trials in-memory
3. Use the same feature selection cache across trials
4. Be orders of magnitude faster than Python subprocess per trial

## Design

### Core: `optimize()` function in lib.rs

```rust
pub fn optimize(
    data: &Data,
    base_param: &Param,
    search_space: &SearchSpace,
    n_trials: usize,
    metric: OptMetric,        // TestAUC, TestSpearman, CVMeanAUC, etc.
    sampler: Sampler,         // TPE (default), Random, Grid
) -> OptResult {
    // Data is loaded ONCE, shared across all trials
    for trial in 0..n_trials {
        let param = sampler.suggest(&search_space, &history);
        let result = run_trial(data, &param);  // no disk I/O
        history.push(trial, param, result);
    }
    OptResult { best_params, best_value, history }
}
```

### Sampler options

1. **Random** — uniform random sampling (baseline)
2. **TPE (Tree-structured Parzen Estimator)** — Optuna's default, Bayesian
3. **Grid** — exhaustive grid search for small spaces
4. **CMA-ES** — covariance matrix adaptation for continuous params

### Search space definition (in param.yaml)

```yaml
optimize:
  n_trials: 100
  metric: test_auc           # or cv_mean_auc, spearman, etc.
  sampler: tpe
  search_space:
    algo: [ga, beam, sa, ils, lasso]
    k_penalty: {log_uniform: [1e-5, 0.01]}
    language: [ter, "bin,ter", "bin,ter,ratio"]
    data_type: [prev, raw, "raw,prev"]
    population_size: {int: [500, 10000]}
    cooling_rate: {uniform: [0.99, 0.9999]}
    feature_minimal_prevalence_pct: {int: [5, 30]}
```

### Key advantages over Python Optuna

| | Python Optuna (#77) | Native Rust |
|---|---|---|
| Data loading | Once per trial (subprocess) | **Once total** |
| Feature selection | Recomputed per trial | **Cached** |
| Overhead per trial | ~2s (process spawn + data I/O) | **~0ms** |
| 100 trials on Qin2014 | ~200s + algo time | **~algo time only** |
| Parallelism | Python GIL limited | Full rayon parallelism |

### Implementation phases

1. **Random sampler + grid** — simplest, proves the architecture
2. **TPE sampler** — port the core algorithm (kernel density estimation)
3. **Pruning** — early stopping of unpromising trials (median pruner)
4. **CLI integration** — `gpredomics --optimize param.yaml`
5. **Web app** — "Tune" button that calls optimize() via gpredomicspy

### References

- Akiba et al. (2019). Optuna: A Next-generation Hyperparameter Optimization Framework. KDD.
- Bergstra et al. (2011). Algorithms for Hyper-Parameter Optimization. NeurIPS.
- TPE: Tree-structured Parzen Estimator (Bergstra et al., 2011)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native Rust hyperparameter optimization — Optuna-style without reloading data #83

Context

Design

Core: `optimize()` function in lib.rs

Sampler options

Search space definition (in param.yaml)

Key advantages over Python Optuna

Implementation phases

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	Python Optuna (#77)	Native Rust
Data loading	Once per trial (subprocess)	Once total
Feature selection	Recomputed per trial	Cached
Overhead per trial	~2s (process spawn + data I/O)	~0ms
100 trials on Qin2014	~200s + algo time	~algo time only
Parallelism	Python GIL limited	Full rayon parallelism

Native Rust hyperparameter optimization — Optuna-style without reloading data #83

Description

Context

Design

Core: optimize() function in lib.rs

Sampler options

Search space definition (in param.yaml)

Key advantages over Python Optuna

Implementation phases

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Core: `optimize()` function in lib.rs