Module job_k8s

Module job_k8s 

Source
Expand description

Kubernetes-based distributed training backend.

Generates PyTorchJob YAML manifests and manages them via kubectl.

Functionsยง

cancel_k8s_job
Cancel a Kubernetes job.
generate_pytorchjob_yaml ๐Ÿ”’
Generate a Kubeflow PyTorchJob YAML manifest.
run_k8s_job
Submit a distributed training job to Kubernetes via PyTorchJob CRD.