Copyright (C) 2026 Dyber, Inc.
zerneld is the Zernel observability daemon. It loads eBPF probes into the kernel to instrument ML workloads in real time -- without any application code changes. Think perf but purpose-built for ML.
+-------------------+ +-------------------+ +-------------------+
| gpu_mem.bpf.c | | cuda_trace.bpf.c | | nccl.bpf.c |
| uprobe: libcuda | | uprobe: libcuda | | uprobe: libnccl |
+--------+----------+ +--------+----------+ +--------+----------+
| | |
v v v
+--------+-------------------------+-------------------------+--------+
| BPF Ring Buffers |
+--------+-------------------------+-------------------------+--------+
| | |
v v v
+--------+----------+ +--------+----------+ +--------+----------+
| GpuMemConsumer | | CudaTraceConsumer | | NcclConsumer |
+--------+----------+ +--------+----------+ +--------+----------+
| | |
+------------+------------+------------+------------+
| |
v v
+-------+--------+ +-----------+-----------+
| AggregatedMetrics| | AlertEngine |
+-------+--------+ +-----------+-----------+
|
+------------+------------+
| |
v v
+--------+----------+ +--------+----------+
| Prometheus HTTP | | WebSocket Server |
| :9091/metrics | | :9092 |
+-------------------+ +-------------------+
| |
v v
Grafana / Alertmanager zernel watch (CLI)
zernel-gpumem)libcuda.sozernel-cuda-trace)cuLaunchKernel / cudaLaunchKernel via uprobeszernel-nccl)ncclAllReduce, ncclBroadcast, etc.zernel-dataload)io_uring and read syscalls from DataLoader workerszernel-dist)futex and pthread_barrier callsAll metrics are exposed at http://localhost:9091/metrics.
# GPU Memory
zernel_gpu_memory_used_bytes{pid, gpu_id}
zernel_gpu_memory_peak_bytes{pid, gpu_id}
# CUDA Kernel Latency
zernel_cuda_launch_latency_seconds{pid, quantile="0.5"}
zernel_cuda_launch_latency_seconds{pid, quantile="0.99"}
zernel_cuda_launch_latency_seconds_count{pid}
# NCCL Collectives
zernel_nccl_collective_duration_seconds{op, quantile="0.5"}
zernel_nccl_collective_duration_seconds{op, quantile="0.99"}
# DataLoader
zernel_dataloader_wait_seconds{pid, quantile="0.5"}
zernel_dataloader_wait_seconds{pid, quantile="0.99"}
zerneld pushes JSON snapshots to connected clients at a configurable interval (default: 1s):
{
"gpu_utilization": [
{"key": "1000:0", "current_bytes": 83886080000, "peak_bytes": 83886080000}
],
"cuda_latency_p50_us": 142.0,
"cuda_latency_p99_us": 891.0,
"nccl_allreduce_p50_ms": 34.0,
"nccl_allreduce_p99_ms": 67.0,
"dataloader_wait_p50_ms": 8.0,
"last_update_ms": 1711800000000
}
Connect with any WebSocket client:
websocat ws://localhost:9092
# Development mode (simulated telemetry, no BPF)
zerneld --simulate
# Production mode (requires Linux + root)
sudo zerneld
# Custom ports
ZERNEL_LOG=debug zerneld --simulate
Import the Zernel dashboard from docs/grafana-dashboard.json (coming soon) or create a Prometheus data source pointing to http://zernel-host:9091.
Alerts are configured in the zerneld source. The default alert:
GPU OOM Warning: triggers when gpu_memory_used_pct > 95%
Custom alerts and webhook integrations are planned for Phase 5.
Copyright © 2026 Dyber, Inc.