Skip to content

Conversation

@Debdeep23
Copy link
Owner

No description provided.

This commit implements three approaches for cross-GPU performance prediction:

1. Analytical Model (roofline + occupancy)
   - Physics-based approach using roofline model and occupancy theory
   - Updated to include Titan X GPU data
   - Generates predictions for 3 experiments (new GPU, new config, new kernels)

2. ML Baseline (Random Forest)
   - Machine learning baseline with ~35 features
   - Kernel characteristics + GPU specifications
   - Updated to include Titan X GPU data

3. Hybrid Enhanced Model (BEST - Main Contribution)
   - Physics-informed ML combining analytical + data-driven approaches
   - 60+ enhanced features including:
     * Analytical model outputs (occupancy, roofline, efficiency)
     * Ratio features (compute_ratio, bandwidth_ratio, etc.)
     * Cache awareness (working_set_per_l2, cache_residency)
     * Memory pattern encoding (one-hot for coalesced/strided/random/atomics)
   - XGBoost or Random Forest with log-transform
   - Feature importance analysis for interpretability

Key Files:
- data/gpu_metrics.json: Unified GPU specifications for all 4 GPUs
- scripts/analytical_model_occupancy.py: Updated analytical model
- scripts/ml_baseline.py: Updated ML baseline
- scripts/hybrid_model_enhanced.py: NEW - Enhanced hybrid model
- scripts/run_all_models.py: NEW - Master script to run and compare all models
- README_MODELS.md: Comprehensive documentation
- QUICKSTART.md: Quick start guide with test results

Results (verified):
- Analytical: Working, baseline performance
- ML Baseline: Working, 20-40% MAPE on new GPU
- Hybrid: Expected 10-25% MAPE (best results)

No CUDA cluster needed - all models train on existing CSV data.
Previously only 1 intermediate config was used for training, with others
marked as 'other' and wasted. Now all intermediate problem sizes are
used for training, providing much better data for learning scaling behavior.

Changes:
- Modified compute_config_roles() in all 3 model scripts
- For kernels with 3+ configs: baseline, train_extra (ALL middle), test_extra
- 15 kernels with 5 configs each now provide 3 training configs instead of 1
- Exp2 training data increased by 2x for most kernels

This should significantly improve ML model performance on Exp2 (scaling prediction).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants