Copyright (C) 2026 Dyber, Inc.
The zernel CLI is a terminal-native development environment for ML/LLM workloads. It provides experiment tracking, model management, a live training dashboard, distributed job orchestration, and a SQL-like query language -- all from the terminal.
cargo install --path zernel-cli
# Or from a release binary
# curl -fsSL https://get.zernel.dev | sh
zernel init <name>Scaffold a new ML project.
zernel init my-project
Creates:
my-project/
zernel.toml # Project configuration
train.py # Starter training script
data/ # Dataset directory
models/ # Model checkpoints
configs/ # Training configs
scripts/ # Utility scripts
zernel run <script> [args...]Run a training script with automatic experiment tracking.
zernel run train.py
zernel run train.py --epochs 10 --lr 0.001
What happens: 1. Creates a new experiment entry in the local SQLite store 2. Records the git commit hash (if in a git repo) 3. Launches the script via Python 4. Captures stdout/stderr in real time 5. Extracts metrics from output (loss, accuracy, lr, throughput, etc.) 6. Periodically saves metrics to the experiment store 7. On completion, records final status and duration
Recognized metric patterns:
- loss: 1.234 or loss=1.234
- accuracy: 0.95 or acc=0.95
- grad_norm: 0.89
- learning_rate: 3e-4 or lr=0.001
- throughput: 412 or samples/s: 412
- epoch: 5 or step: 4821
- perplexity: 12.3 or ppl: 12.3
- eval_loss: 1.1
zernel watchFull-screen terminal dashboard showing real-time training metrics.
zernel watch
Dashboard panels: - GPU Utilization: per-device utilization bars with memory usage - Training Metrics: loss, step progress, ETA - eBPF Telemetry: CUDA launch latency, NCCL duration, DataLoader wait, PCIe bandwidth - Scheduler Phase: current ML workload phase indicator
Keyboard shortcuts:
- q / Esc -- quit
- r -- reset demo state
Note: Currently runs in demo mode with simulated data. Connect to a running zerneld instance for live telemetry (coming in Phase 4).
zernel exp listList all tracked experiments.
zernel exp list
zernel exp list --limit 50
Output:
ID Name Status Loss Acc Duration
-----------------------------------------------------------------------------------------------
exp-20260330-142315-a1b2c3d4 llama3-finetune Done 1.1870 0.8340 2h 14m
exp-20260330-120100-e5f6g7h8 llama3-baseline Done 1.2341 0.8210 1h 42m
zernel exp show <id>Show full details of an experiment.
zernel exp show exp-20260330-142315-a1b2c3d4
zernel exp compare <a> <b>Diff hyperparameters and metrics between two experiments.
zernel exp compare exp-001 exp-002
Output:
Comparing: exp-001 vs exp-002
llama3-baseline vs llama3-lr-sweep
Hyperparameters:
learning_rate: 0.0001 -> 0.0003
warmup_steps: 100 -> 200
Metrics:
loss: 1.2341 -> 1.1870 (-3.8%)
accuracy: 0.8210 -> 0.8340 (+1.6%)
zernel exp delete <id>Delete an experiment from the store.
zernel exp delete exp-001
zernel model save <path>Save a model checkpoint to the local registry.
zernel model save ./checkpoints/epoch-10 --name llama3-v1 --tag production
Saves: checkpoint files + git commit + metadata.
zernel model listList all registered models.
zernel model list
zernel model deploy <name:tag>Deploy a model for inference.
zernel model deploy llama3-v1:production --port 8080
zernel job submit <script>Submit a distributed training job.
zernel job submit train.py --gpus-per-node 8 --nodes 4 --framework pytorch --backend nccl
zernel doctorDiagnose the Zernel environment.
zernel doctor
Checks: OS, Python, NVIDIA driver, CUDA, PyTorch, PyTorch CUDA, Git, zerneld status, experiment DB.
zernel query "<ZQL>"Query experiments with ZQL (Zernel Query Language).
zernel query "SELECT name, loss, learning_rate FROM experiments WHERE loss < 1.5 ORDER BY loss ASC LIMIT 10"
ZQL is a SQL-like query language for the experiment store.
SELECT <columns>
FROM <table>
[WHERE <condition> [AND <condition>...]]
[ORDER BY <column> [ASC|DESC]]
[LIMIT <n>]
Available tables: experiments
Available columns: id, name, status, loss, accuracy, learning_rate, and any metric extracted during training.
Operators: =, !=, <, >, <=, >=
Examples:
-- All experiments sorted by loss
SELECT * FROM experiments ORDER BY loss ASC
-- Experiments better than baseline
SELECT name, loss FROM experiments WHERE loss < 1.5
-- Failed experiments
SELECT name, status FROM experiments WHERE status = 'Failed'
All data is stored locally in ~/.zernel/:
~/.zernel/
experiments/
experiments.db # SQLite database
models/
registry.json # Model registry index
<model-name>/
<tag>/ # Checkpoint files
| Variable | Default | Description |
|---|---|---|
ZERNEL_LOG |
zernel=warn |
Log level filter |
Project-level configuration lives in zernel.toml:
[project]
name = "my-project"
version = "0.1.0"
[training]
framework = "pytorch"
gpus = "auto"
[tracking]
enabled = true
auto_log = true
zernel gpu)zernel gpu status # Clean overview (nvidia-smi replacement)
zernel gpu top # Real-time GPU process viewer
zernel gpu mem # Memory usage by process
zernel gpu kill 0 # Kill all processes on GPU 0
zernel gpu lock 0,1 --for-job job-123 # Reserve GPUs
zernel gpu unlock 0,1 # Release reservation
zernel gpu temp --alert 85 # Temperature monitoring with alerts
zernel gpu power --limit 300W # Set power limit
zernel gpu health # ECC errors, throttling, PCIe check
zernel bench)zernel bench all # Full 5-test benchmark suite
zernel bench quick # 5-minute smoke test
zernel bench gpu # GPU compute TFLOPS at multiple matrix sizes
zernel bench nccl # Multi-GPU NCCL bandwidth
zernel bench dataloader --workers 8 # DataLoader throughput
zernel bench memory # GPU memory allocation latency
zernel bench e2e --model resnet50 --iterations 100 # Training throughput
zernel bench report # Generate report
zernel debug)zernel debug why-slow # 4-step diagnosis: GPU util, CPU, memory, recommendations
zernel debug oom # GPU OOM analysis with 6 fix suggestions
zernel debug nan train.py # Run with torch anomaly detection (traces NaN source)
zernel debug hang # NCCL deadlock diagnosis (env var configuration)
zernel debug checkpoint ./ckpt # Verify structure, shapes, dtypes, size
zernel debug trace train.py # Run with CUDA_LAUNCH_BLOCKING=1 + stack traces
zernel data)zernel data profile ./dataset.parquet # Stats: rows, columns, schema, size
zernel data profile ./images/ # Dir: file count, extensions, total size
zernel data split ./data --train 0.8 --val 0.1 --seed 42 # Reproducible split
zernel data shard ./data --shards 64 # Shard for distributed training
zernel data cache ./data --to /nvme/cache # rsync to fast storage
zernel data benchmark --workers 8 # DataLoader throughput by worker count
zernel data serve ./data --port 8888 # HTTP file server for multi-node
zernel serve)zernel serve start ./model # Auto-detect engine (vLLM/TRT/ONNX)
zernel serve start ./model --engine vllm --replicas 4 # Tensor-parallel
zernel serve start ./model --quantize int8 # Quantized inference
zernel serve list # Show running inference servers
zernel serve stop my-model # Stop server
zernel serve benchmark http://localhost:8080 --qps 100 # Load test
zernel hub)zernel hub push ./model --name org/llama-finetune --tag v1 # Push to local hub
zernel hub pull org/llama-finetune:v1 # Pull from hub
zernel hub list # List all hub entries
zernel hub search "llama" # Search by name
zernel cluster)zernel cluster add gpu-server-01 --gpus 8 --user root # Register node (SSH test)
zernel cluster status # Live overview: all nodes, GPU util, memory
zernel cluster ssh gpu-server-01 # SSH to a node
zernel cluster sync ./code --to ~/ # rsync to all nodes
zernel cluster run "nvidia-smi" --on all # Run command on all nodes
zernel cluster drain gpu-server-01 # Mark for maintenance
zernel cost)zernel cost summary # Total jobs, GPU-hours, success rate
zernel cost job job-123 # Cost for a specific job
zernel cost budget --set 10000 # Set GPU-hour budget with alerts
zernel cost report --month march # Generate cost report
zernel env)zernel env show # Display current environment
zernel env snapshot --output env.yml # Save to file
zernel env diff env-a.yml env-b.yml # Compare two environments
zernel env reproduce env.yml # Recreate from snapshot
zernel env export --format docker # Generate Dockerfile
zernel env export --format pip # Generate requirements.txt
zernel notebook)zernel notebook start --port 8888 # Launch Jupyter Lab
zernel notebook open train.ipynb # Open specific notebook
zernel notebook convert train.ipynb --to py # Convert to Python script
zernel notebook list # List running servers
zernel notebook stop # Stop all servers
zernel install)zernel install pytorch # Install PyTorch + CUDA
zernel install ollama # Install Ollama local LLM
zernel install jupyter # Install Jupyter Lab
zernel install vllm # Install vLLM inference
zernel install deepspeed # Install DeepSpeed
zernel install langchain # Install LangChain
zernel install all # Install everything
zernel install --list # Show all 25+ available tools
zernel pqc)Quantum-resistant cryptographic tools for protecting ML assets.
zernel pqc status # Show PQC configuration and key status
zernel pqc keygen --name mykey # Generate ML-KEM + ML-DSA compatible keypair
zernel pqc sign ./model --key mykey # Sign a model/checkpoint (SHA-256 + HMAC)
zernel pqc verify ./model # Verify signature integrity
zernel pqc encrypt ./model --key mykey # Encrypt with AES-256-GCM (PQC key exchange)
zernel pqc decrypt ./model.zernel-enc --key mykey # Decrypt
zernel pqc boot-verify # Verify UEFI Secure Boot chain
zernel pqc keys # List all PQC keys
zernel power)Phase-aware GPU power management that reduces energy 10-20% with <1% throughput impact.
zernel power status # Show GPU power state (clocks, draw, limit, efficiency)
zernel power enable # Enable phase-aware power management
zernel power disable # Reset GPUs to default power state
zernel power energy # Show energy consumption for training
zernel power carbon # Carbon footprint estimate (kWh → CO2)
zernel power carbon --intensity 0.25 # Custom grid intensity (kg CO2/kWh)
zernel power profile train.py # Profile GPU power during a script
zernel optimize)zernel optimize precision train.py # Mixed precision advisor (BF16/FP16/TF32)
zernel optimize memory # CUDA memory allocator configuration
zernel optimize checkpoint ./ckpt # Checkpoint optimization recommendations
zernel optimize scan train.py # Full optimization audit
zernel optimize numa # NUMA topology + data placement advice
zernel fleet)Enterprise-scale GPU fleet management for 100-10,000+ GPUs.
zernel fleet status # Fleet-wide GPU utilization, power draw, daily cost
zernel fleet costs # Cost attribution by period
# A100 on-demand: $2.50/GPU-hr
# A100 reserved: $1.50/GPU-hr
# H100 on-demand: $4.00/GPU-hr
# On-prem (electricity): $0.10/kWh
zernel fleet idle # Detect GPUs below utilization threshold
zernel fleet idle --threshold 5 --duration 30 # Custom thresholds
zernel fleet reclaim # Power down idle GPUs
zernel fleet reclaim --dry-run # Preview what would be reclaimed
zernel fleet rightsize # GPU type recommendations from usage patterns
zernel fleet plan --growth 15 # 12-month capacity forecast
zernel fleet health # Check all fleet subsystems
zernel audit)Immutable training logs, data lineage, model provenance, and compliance exports.
zernel audit trail <exp-id> # Full audit record for an experiment
zernel audit export --format json # Export all experiment metadata
zernel audit export --format csv # CSV format for spreadsheets
zernel audit lineage model:tag # Data lineage chain
zernel audit provenance <id> # Model provenance (5-step chain)
zernel audit report --standard soc2 # SOC 2 Type II compliance report
zernel audit report --standard hipaa # HIPAA compliance controls
zernel onboard)Gets a new team member from "laptop" to "training a model" in minutes.
zernel onboard setup my-project # 5-step automated onboarding:
# 1. Environment check (python, git, GPU)
# 2. ML stack verification
# 3. Project creation
# 4. Environment snapshot
# 5. Next-steps guide
zernel onboard share # Generate shareable environment snapshot
zernel onboard sync env.yml # Reproduce a teammate's environment
Copyright © 2026 Dyber, Inc.