Skip to content

Michael-ikali/4YP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Developing a sustainable AI system

Overview

The rapid growth of large language models (LLMs) has significantly increased the computational demand and energy consumption of modern GPU servers. This project develops a system-level GPU frequency control algorithm to improve power efficiency and reduce energy consumption for LLM inference workloads under varying throughput requirements.

The proposed system supports both:

  • Fixed-workload scheduling
  • Fixed-interval scheduling

A performance and power model is first derived from experiment data and then used as input to an optimization algorithm that determines optimal GPU frequency settings and workload allocations.


Objectives

  • Improve energy efficiency of multi-GPU LLM inference systems
  • Reduce overall energy consumption under throughput constraints
  • Develop system-level frequency and workload optimization strategies

Methodology

The system consists of three main stages:

1. Performance & Power Modeling

Empirical measurements are collected to model:

  • GPU performance and Power under different frequency settings

2. Optimization Algorithm

A system-level optimization framework is used to determine:

  • GPU frequency configuration
  • Workload allocation across GPUs
  • Idle-state configuration

Implemented in:

  • optimization.py
  • opt.py
  • cal.py

3. Evaluation

The optimized scheduling strategy is evaluated on a multi-GPU system (ARC cluster) and compared against baseline configurations.

Implemented in:

  • final_test.py
  • final_test_boot.py
  • final_run.sh

📁 Repository Structure

.
├── calc.py                 # Efficiency and energy calculation
├── optimization.py         # Core optimization formulation
├── opt.py                  # Main optimization execution logic
├── final_test.py           # Main evaluation script
├── final_test_boot.py      # Reboot experiment script
├── final_run.sh            # Shell script for configuring GPU frequencies on ARC
├── results/                # Output results and figures
│ ├── benchmark/            # Benchmark measurements
│ ├── boot/                 # Boot measurements
│ ├── idle/                 # Algorithm evaluation measurements
└── pre_results             # Pre-experiment measurements on performance and power models
└── README.md

About

Fourth Year Project: Developing a system-level frequency control algorithm for GPU clusters running inference workload.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors