Skip to content
This repository was archived by the owner on Sep 12, 2023. It is now read-only.
This repository was archived by the owner on Sep 12, 2023. It is now read-only.

PR 135 makes replicaType updates hard #154

@Jeffwan

Description

@Jeffwan

In this PR, https://github.com/kubeflow/common/pull/135/files, it changes rtype to ReplicaType.

However, it brings some challenges in operator upgrade.

-		expectationPodsKey := expectation.GenExpectationPodsKey(jobKey, rtype)
+		expectationPodsKey := expectation.GenExpectationPodsKey(jobKey, apiv1.ReplicaType(rtype))

Using this line as an example, controller retrieve rtype from pod label. It's lower case at this moment.

https://github.com/kubeflow/tf-operator/blob/ff5aaf10bd8a665cbce72dd2cd1d2c0cd06f329e/pkg/apis/tensorflow/v1/types.go#L77

https://github.com/kubeflow/tf-operator/blob/ff5aaf10bd8a665cbce72dd2cd1d2c0cd06f329e/pkg/apis/pytorch/v1/pytorchjob_types.go#L62

apiv1.ReplicaType(rtype) is inaccurate and meaningless because we define Upper case role like Master in the past,
and this conversion give us a non-exist ReplicaType because it's lower case master.

we have some references like below.

https://github.com/kubeflow/tf-operator/blob/ff5aaf10bd8a665cbce72dd2cd1d2c0cd06f329e/pkg/controller.v1/pytorch/pytorch.go#L46

I feel like all the role should be lower case, we make sure we don't use ReplicaType for comparison. Instead, still use strings.toLower(string(rtype)) which means PR 135 still not helpful?

/cc @MartinForReal @gaocegege @kubeflow/wg-training-leads

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions