Expand description
Smart GPU Power Management
Dynamically adjusts GPU power states based on ML workload phase:
- GpuCompute: full power (max clocks)
- DataLoading: reduce GPU clocks (GPU mostly idle, save power)
- NcclCollective: reduce compute clocks, keep memory clocks high
- OptimizerStep: brief burst, keep full power
Can reduce energy consumption by 10-20% with <1% throughput impact.
Structs§
- Energy
Tracker - Track energy consumption over time.
- Power
Profile - GPU power profile for each ML workload phase.
Functions§
- apply_
profile - Apply a power profile to a specific GPU.
- get_
max_ clocks - Query max clocks for a GPU.
- profile_
for_ phase - Default power profiles for each phase.
- reset_
power - Reset GPU to default power state.