发生了什么?
如果节点上已经运行了具有相同资源名称的设备插件,kubelet将无法连接到设备插件。kubelets可能会在短暂的时间内运行两个广告同一设备插件的pod。例如,考虑以下供应商假设的“升级”设备插件的情况:
- kubelet运行
first-device-plugin
,广告资源example.com/resource
- 设备供应商将设备插件升级为
second-plugin
,用户部署它 - kubelet切换使用
second-device-plugin
,广告资源example.com/resource
- 用户删除
first-device-plugin
在v1.25.0之前,kubelet会处理这种情况,在步骤4中,断开first-device-plugin
将确保断开first-device-plugin
,并保持与second-device-plugin
(最近连接)的连接。这作为#109016的一部分回归,其中在步骤4中删除first-device-plugin
时,kubelet将断开“最近”连接的插件,即second-device-plugin
,从而进入一个既不连接第一个也不连接第二个设备插件的情况。这将导致kubelet无法为设备广告资源。这是因为kubelet只存储最近连接插件的状态,并在断开连接时断开最近连接的插件。
你期望发生什么?
我期望kubelet在删除设备插件pod时,要么保持与1.25之前的相同行为:断开已删除的设备插件,而不是最近连接的插件。
我期望当新的设备插件连接并被设备管理器注册时,如果已经运行了现有的设备插件,kubelet应该通过拒绝注册请求或断开与现有设备插件的连接并连接到新的设备插件来处理这种情况。
我们如何尽可能精确地重现它?
在master和v1.27.3的kind中重现:
创建kind集群(使用单个工作节点v1.27.3):
kind_config="$(cat << EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
ipFamily: ipv4
nodes:
# the control plane node
- role: control-plane
- role: worker
kubeadmConfigPatches:
- |
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
v: "6"
read-only-port: "10255"
EOF
)"
$ kind create cluster --config <(printf '%s\n' "${kind_config}") --image kindest/node:v1.27.3
(一个kubelet,kind-worker用于重现)
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
kind-control-plane Ready control-plane 27s v1.27.3
kind-worker Ready <none> 8s v1.27.3
验证初始节点容量和节点可分配性(由于设备插件未安装,设备未按预期广告):
$ kubectl get node kind-worker -o json | jq '.status| {allocatable: .allocatable, capacity: .capacity}'
{
"allocatable": {
"cpu": "12",
"ephemeral-storage": "421481144Ki",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"memory": "65425280Ki",
"pods": "110"
},
"capacity": {
"cpu": "12",
"ephemeral-storage": "421481144Ki",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"memory": "65425280Ki",
"pods": "110"
}
}
应用第一个设备插件pod。
https://gist.github.com/bobbypage/897296ad4726c0a7b09ea1a6bba47062
$ kubectl apply -f first-device-plugin.yaml
pod/first-device-plugin created
验证它已运行。
$ kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=kind-worker
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default first-device-plugin 1/1 Running 0 8s 10.244.1.2 kind-worker <none> <none>
kube-system kindnet-42ww9 1/1 Running 0 39s 192.168.8.3 kind-worker <none> <none>
kube-system kube-proxy-k7pgh 1/1 Running 0 39s 192.168.8.3 kind-worker <none> <none>
新资源example.com/resource
被广告
$ kubectl get node kind-worker -o json | jq '.status| {allocatable: .allocatable, capacity: .capacity}'
{
"allocatable": {
"cpu": "12",
"ephemeral-storage": "421481144Ki",
"example.com/resource": "2",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"memory": "65425280Ki",
"pods": "110"
},
"capacity": {
"cpu": "12",
"ephemeral-storage": "421481144Ki",
"example.com/resource": "2",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"memory": "65425280Ki",
"pods": "110"
}
}
创建第二个设备插件pod。考虑这是一个设备的“新版本”。
# https://gist.github.com/bobbypage/18d526daf4c1b72bdf659e65ed3890d2
$ kubectl apply -f second-device-plugin.yaml
pod/second-device-plugin created
验证pod已运行。
$ kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=kind-worker
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default first-device-plugin 1/1 Running 0 42s 10.244.1.2 kind-worker <none> <none>
default second-device-plugin 1/1 Running 0 11s 10.244.1.3 kind-worker <none> <none>
kube-system kindnet-42ww9 1/1 Running 0 73s 192.168.8.3 kind-worker <none> <none>
kube-system kube-proxy-k7pgh 1/1 Running 0 73s 192.168.8.3 kind-worker <none> <none>
example.com/resource
仍然被广告为2。
$ kubectl get node kind-worker -o json | jq '.status| {allocatable: .allocatable, capacity: .capacity}'
{
"allocatable": {
"cpu": "12",
"ephemeral-storage": "421481144Ki",
"example.com/resource": "2",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"memory": "65425280Ki",
"pods": "110"
},
"capacity": {
"cpu": "12",
"ephemeral-storage": "421481144Ki",
"example.com/resource": "2",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"memory": "65425280Ki",
"pods": "110"
}
}
现在,删除第一个设备插件
$ kubectl delete pod first-device-plugin
pod "first-device-plugin" deleted
example.com/resource
的可分配空间降至零,将无法恢复!
$ kubectl get node kind-worker -o json | jq '.status| {allocatable: .allocatable, capacity: .capacity}'
{
"allocatable": {
"cpu": "12",
"ephemeral-storage": "421481144Ki",
"example.com/resource": "0",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"memory": "65425280Ki",
"pods": "110"
},
"capacity": {
"cpu": "12",
"ephemeral-storage": "421481144Ki",
"example.com/resource": "2",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"memory": "65425280Ki",
"pods": "110"
}
}
第二个设备插件pod仍在运行,但资源不再被广告。
$ kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=kind-worker
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default second-device-plugin 1/1 Running 0 63s 10.244.1.3 kind-worker <none> <none>
kube-system kindnet-42ww9 1/1 Running 0 2m5s 192.168.8.3 kind-worker <none> <none>
kube-system kube-proxy-k7pgh 1/1 Running 0 2m5s 192.168.8.3 kind-worker <none> <none>
Kubelet日志 - https://gist.github.com/bobbypage/04424608bf89bb1a7043cf50dadf28e5
Containerd日志 - https://gist.github.com/bobbypage/351c60f690687067a631a6fcb7ba0d77
5条答案
按热度按时间falq053o1#
/sig node
ecfsfe2w2#
/triage accepted
gr8qqesn3#
我的2C:升级流程(及其中断)似乎是一个合法且重要的用例,所以让我们来处理这个问题。
pod7payv4#
感谢@ffromani!我认为我们有两个解决方案可以继续进行:
这很简单,但可能被认为是稍微调整一下行为,因为旧设备插件需要提前停止。
2. 另一种方法是,当我们检测到设备插件正在注册时,如果已经存在一个具有相同设备名称的设备插件,我们将断开与现有插件的连接并连接到新插件。我有一个WIP补丁用于此 - bobbypage@343f839 。
这稍微复杂一些,但遵循了1.25之前的版本行为。我认为我更倾向于选择第二个选项,但很高兴听到你的想法。
cc @elezar 分享你的想法
cidc1ykv5#
感谢@bobbypage。我需要仔细审查git历史,我想了解是否将更改回溯到过去是有意为之还是重构过程中的意外副作用。我也倾向于选择你的第二个选项。我会尽快跟进。