TFJob Common
Reference documentation for TFJob
Out of date
This guide contains outdated information pertaining to Kubeflow 1.0. This guide needs to be updated for Kubeflow 1.1.
Packages:
kubeflow.org
Package v1 is the v1 version of the API.
Resource Types:
CleanPodPolicy (string
alias)
CleanPodPolicy describes how to deal with pods when the job is finished. Can be one of: All, Running, or None.
JobCondition
(Appears on: JobStatus)
JobCondition describes the state of the job at a certain point.
Field | Description |
---|---|
type JobConditionType | Type of job condition. |
status | Status of the condition, one of True, False, or Unknown. |
reason string | The reason for the condition’s last transition. |
message string | A readable message indicating details about the transition. |
lastUpdateTime Kubernetes meta/v1.Time | The last time this condition was updated. |
lastTransitionTime Kubernetes meta/v1.Time | Last time the condition transitioned from one status to another. |
JobConditionType (string
alias)
(Appears on: JobCondition)
JobConditionType defines all possible types of JobStatus. Can be one of: Created, Running, Restarting, Succeeded, or Failed.
JobStatus
JobStatus represents the current observed state of the training job.
Field | Description |
---|---|
conditions [][]github.com/kubeflow/tf-operator/pkg/apis/common/v1.JobCondition | An array of current observed job conditions. |
replicaStatuses map[github.com/kubeflow/tf-operator/pkg/apis/common/v1.ReplicaType]*github.com/kubeflow/tf-operator/pkg/apis/common/v1.ReplicaStatus | A map from ReplicaType (key) to ReplicaStatus (value), specifying the status of each replica. |
startTime Kubernetes meta/v1.Time | Represents the time when the job was acknowledged by the job controller. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC. |
completionTime Kubernetes meta/v1.Time | Represents the time when the job was completed. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC. |
lastReconcileTime Kubernetes meta/v1.Time | Represents the last time when the job was reconciled. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC. |
ReplicaSpec
ReplicaSpec is a description of the job replica.
Field | Description |
---|---|
replicas int32 | The desired number of replicas of the given template. If unspecified, defaults to 1. |
template Kubernetes core/v1.PodTemplateSpec | Describes the pod that will be created for this replica. Note that RestartPolicy in PodTemplateSpec will be overidden by RestartPolicy in ReplicaSpec. |
restartPolicy RestartPolicy | Restart policy for all replicas within the job. One of Always, OnFailure, Never, or ExitCode. Defaults to Never. |
ReplicaStatus
(Appears on: JobStatus)
ReplicaStatus represents the current observed state of the replica.
Field | Description |
---|---|
active int32 | The number of actively running pods. |
succeeded int32 | The number of pods which reached phase Succeeded. |
failed int32 | The number of pods which reached phase Failed. |
ReplicaType (string
alias)
ReplicaType represents the type of the job replica. Each operator (e.g. TensorFlow, PyTorch) needs to define its own set of ReplicaTypes.
RestartPolicy (string
alias)
(Appears on: ReplicaSpec)
RestartPolicy describes how the replicas should be restarted. Can be one of: Always, OnFailure, Never, or ExitCode. If none of the following policies is specified, the default one is RestartPolicyAlways.
Generated with gen-crd-api-reference-docs
on git commit fd76deec
.
Last modified 28.05.2021: TFJob v1 Common fix links (#2743) (900f3e9f)