Module power

Module power 

Source
Expand description

Smart GPU Power Management

Dynamically adjusts GPU power states based on ML workload phase:

  • GpuCompute: full power (max clocks)
  • DataLoading: reduce GPU clocks (GPU mostly idle, save power)
  • NcclCollective: reduce compute clocks, keep memory clocks high
  • OptimizerStep: brief burst, keep full power

Can reduce energy consumption by 10-20% with <1% throughput impact.

Structs§

EnergyTracker
Track energy consumption over time.
PowerProfile
GPU power profile for each ML workload phase.

Functions§

apply_profile
Apply a power profile to a specific GPU.
get_max_clocks
Query max clocks for a GPU.
profile_for_phase
Default power profiles for each phase.
reset_power
Reset GPU to default power state.