Modulesยง
- aggregation ๐
- alerts ๐
- consumers ๐
- fallback ๐
- Fallback telemetry provider using nvidia-smi and /proc.
- gpu_
watchdog ๐ - GPU Memory Pressure Watchdog
- loader ๐
- BPF program loader for Zernel observability probes.
- metrics_
server ๐ - nccl_
priority ๐ - NCCL Network Priority via eBPF/tc
- power ๐
- Smart GPU Power Management
- prefetch ๐
- Predictive Data Prefetching
- simulator ๐
- Generates simulated ML workload telemetry for development and demos. Used when BPF probes are unavailable (non-Linux or no root).
- websocket_
server ๐
Structsยง
- Args ๐
Enumsยง
- Telemetry
Source ๐ - Telemetry source selection result.
Functionsยง
- main ๐