Expand description
Kubernetes-based distributed training backend.
Generates PyTorchJob YAML manifests and manages them via kubectl.
Functionsยง
- cancel_
k8s_ job - Cancel a Kubernetes job.
- generate_
pytorchjob_ ๐yaml - Generate a Kubeflow PyTorchJob YAML manifest.
- run_
k8s_ job - Submit a distributed training job to Kubernetes via PyTorchJob CRD.