Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uninitialized taint should always be dropped #8258

Open
2 of 3 tasks
ykakarap opened this issue Mar 9, 2023 · 13 comments
Open
2 of 3 tasks

uninitialized taint should always be dropped #8258

ykakarap opened this issue Mar 9, 2023 · 13 comments
Labels
area/machinepool Issues or PRs related to machinepools help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@ykakarap
Copy link
Contributor

ykakarap commented Mar 9, 2023

Detailed Description

#7993 introduced the node.cluster.x-k8s.io/uninitialized:NoSchedule taint. This taint is applied by default to the nodes when CAPBK is used as the bootstrap provider. CAPI drops this taint from the nodes after the nodes are initialized (labels are synced).

This issue is to audit and ensure that the node is dropped by CAPI when using any of the Machine/MachinePool solutions.

  • Machines
  • MachinePoll
  • MachinePool Machines

[A clear and concise description of what you want to happen.]

Anything else you would like to add:

More context on the taint: The taint was introduced to solve the delay problem when syncing label to nodes to avoid unnecessarily scheduling workloads on wrong nodes.
Part of proposal: Label Sync Between Machines and underlying Kubernetes Nodes

/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 9, 2023
@sbueringer
Copy link
Member

sbueringer commented Mar 10, 2023

@ykakarap Sorry I think my comment on the PR was a bit misleading. I think we're probably fine with the taint (as you already implemented that in #7993), although it doesn't hurt to double check.

I was referring to in-place propagation in general. I wonder if we want to have in-place propagation for MachinePool as well, especially once the MachinePool Machines are introduced, but maybe even for the objects "below" MachinePool we have today.

Or maybe the concept doesn't make sense for MachinePools not sure.

@sbueringer
Copy link
Member

^^ @CecileRobertMichon

@fabriziopandini
Copy link
Member

/triage accepted
@sbueringer please check we should rename this issue to make it more clear what we want to achieve

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 20, 2023
@fabriziopandini
Copy link
Member

/help

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Mar 20, 2023
@sbueringer
Copy link
Member

I would like to get feedback to my comment from @ykakarap before retitling/rewriting the issue

@ykakarap
Copy link
Contributor Author

I am +1 to exploring in-place propagation support in MachinePools and MachinePool Machines (when it is introduced).

I am not aware of the full details of how MachinePool Machines is going to work but I would like to capture that once MachinePool Machines are introduces we ensure that they too drop the uninitialized taint. As long as we capture that after rewriting I am +1 to renaming/rewriting.

@CecileRobertMichon
Copy link
Contributor

@ykakarap PR for MachinePool Machines is ready for review: #7938 and proposal with more detail can be found here: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220209-machinepool-machines.md

@sbueringer
Copy link
Member

sbueringer commented Mar 21, 2023

I am +1 to exploring in-place propagation support in MachinePools and MachinePool Machines (when it is introduced).

I am not aware of the full details of how MachinePool Machines is going to work but I would like to capture that once MachinePool Machines are introduces we ensure that they too drop the uninitialized taint. As long as we capture that after rewriting I am +1 to renaming/rewriting.

Ah got your point that we should keep dropping the taint even with MachinePool Machines, I just assumed it's good enough that we already do it. But you have a good point there.

@ykakarap Then I think it's better to keep this issue restricted to the taint and create a separate issue to think about in-place propagation for MachinePools in general? (independent of the taint topic)
I think for CAPI users it would be great if in-place propagation is done in a similar way in MachineDeployments and MachinePools (if possible). Or maybe that is already how MachinePools work today? (I just don't know)

@ykakarap
Copy link
Contributor Author

@sbueringer A separate issue sounds good. I agree that it would be great if we can (if possible, depending on feasibility) provide consistent in-place propagation behavior for the user.

@Jont828
Copy link
Contributor

Jont828 commented Mar 29, 2023

@ykakarap So MachinePool Machines are really just using the Machine resource with an empty bootstrap ref. As a result, we skip the bootstrapping stage since it's handled by the MachinePool. Do you think this issue would apply to MachinePool Machines or would it already be taken care of?

@sbueringer
Copy link
Member

sbueringer commented Apr 26, 2023

@Jont828 What do you mean with skip the bootstrapping stage?

When e.g. the AzureMachinePool creates new instances/servers in Azure doesn't it use the KubeadmConfig (and the bootstrap data we generate based on that) to bootstrap the new instances/servers?

(I'm specifically talking about how the bootstrapping happens in reality not what we mirro back in the Machine objects)

As long as it uses the KubeadmConfig and the kubeadm bootstrap provider (CABPKG) to bootstrap nodes the nodes should end up with the "node.cluster.x-k8s.io/uninitialized" taint. In #7993 Yuvaraj already implemented it that the CAPI core MachinePool controller is removing the taint.

But this is not a complete solution. For MachineDeployments the CAPI core Machine controller is syncing labels from Machines to nodes (and after that it drops the unitialized taint). So I think as soon as we have Machines for MachinePools the Machine controller will also start syncing labels to nodes (and afterwards drop the taint from the Node).

So if I get it correctly in the PR where we introduce MachinePool Machines we should get the label sync from Machines to Nodes for free (except if we disable it). But I think we should change the MachinePool controller to stop dropping the unitialized taint. Not sure what is happening to the annotations the MachinePool controller is adding as the Machine controller is adding them as well.

This might all be already implemented / resolved in the MachinePool Machines PR, but I'll take a closer look when I review the PR and bring it up there again.

@fabriziopandini
Copy link
Member

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 11, 2024
@sbueringer sbueringer added the area/machinepool Issues or PRs related to machinepools label Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/machinepool Issues or PRs related to machinepools help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

6 participants