kubernetes kubelet:如果之前的插件已终止,设备插件无法连接到新的插件示例,

qq24tv8q  于 6个月前  发布在  Kubernetes
关注(0)|答案(5)|浏览(61)

发生了什么?
如果节点上已经运行了具有相同资源名称的设备插件,kubelet将无法连接到设备插件。kubelets可能会在短暂的时间内运行两个广告同一设备插件的pod。例如,考虑以下供应商假设的“升级”设备插件的情况:

  1. kubelet运行first-device-plugin,广告资源example.com/resource
  2. 设备供应商将设备插件升级为second-plugin,用户部署它
  3. kubelet切换使用second-device-plugin,广告资源example.com/resource
  4. 用户删除first-device-plugin
    在v1.25.0之前,kubelet会处理这种情况,在步骤4中,断开first-device-plugin将确保断开first-device-plugin,并保持与second-device-plugin(最近连接)的连接。这作为#109016的一部分回归,其中在步骤4中删除first-device-plugin时,kubelet将断开“最近”连接的插件,即second-device-plugin,从而进入一个既不连接第一个也不连接第二个设备插件的情况。这将导致kubelet无法为设备广告资源。这是因为kubelet只存储最近连接插件的状态,并在断开连接时断开最近连接的插件。

你期望发生什么?
我期望kubelet在删除设备插件pod时,要么保持与1.25之前的相同行为:断开已删除的设备插件,而不是最近连接的插件。
我期望当新的设备插件连接并被设备管理器注册时,如果已经运行了现有的设备插件,kubelet应该通过拒绝注册请求或断开与现有设备插件的连接并连接到新的设备插件来处理这种情况。

我们如何尽可能精确地重现它?
在master和v1.27.3的kind中重现:
创建kind集群(使用单个工作节点v1.27.3):

kind_config="$(cat << EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  ipFamily: ipv4
nodes:
# the control plane node
- role: control-plane
- role: worker
  kubeadmConfigPatches:
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        v: "6"
        read-only-port: "10255"
EOF
)"

$ kind create cluster --config <(printf '%s\n' "${kind_config}") --image kindest/node:v1.27.3

(一个kubelet,kind-worker用于重现)

$ kubectl get nodes
NAME                 STATUS   ROLES           AGE   VERSION
kind-control-plane   Ready    control-plane   27s   v1.27.3
kind-worker          Ready    <none>          8s    v1.27.3

验证初始节点容量和节点可分配性(由于设备插件未安装,设备未按预期广告):

$ kubectl get node kind-worker -o json | jq '.status| {allocatable: .allocatable, capacity: .capacity}'
{
  "allocatable": {
    "cpu": "12",
    "ephemeral-storage": "421481144Ki",
    "hugepages-1Gi": "0",
    "hugepages-2Mi": "0",
    "memory": "65425280Ki",
    "pods": "110"
  },
  "capacity": {
    "cpu": "12",
    "ephemeral-storage": "421481144Ki",
    "hugepages-1Gi": "0",
    "hugepages-2Mi": "0",
    "memory": "65425280Ki",
    "pods": "110"
  }
}

应用第一个设备插件pod。

https://gist.github.com/bobbypage/897296ad4726c0a7b09ea1a6bba47062
$ kubectl apply -f first-device-plugin.yaml
pod/first-device-plugin created

验证它已运行。

$ kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=kind-worker
NAMESPACE     NAME                  READY   STATUS    RESTARTS   AGE   IP            NODE          NOMINATED NODE   READINESS GATES
default       first-device-plugin   1/1     Running   0          8s    10.244.1.2    kind-worker   <none>           <none>
kube-system   kindnet-42ww9         1/1     Running   0          39s   192.168.8.3   kind-worker   <none>           <none>
kube-system   kube-proxy-k7pgh      1/1     Running   0          39s   192.168.8.3   kind-worker   <none>           <none>

新资源example.com/resource被广告

$ kubectl get node kind-worker -o json | jq '.status| {allocatable: .allocatable, capacity: .capacity}'
{
  "allocatable": {
    "cpu": "12",
    "ephemeral-storage": "421481144Ki",
    "example.com/resource": "2",
    "hugepages-1Gi": "0",
    "hugepages-2Mi": "0",
    "memory": "65425280Ki",
    "pods": "110"
  },
  "capacity": {
    "cpu": "12",
    "ephemeral-storage": "421481144Ki",
    "example.com/resource": "2",
    "hugepages-1Gi": "0",
    "hugepages-2Mi": "0",
    "memory": "65425280Ki",
    "pods": "110"
  }
}

创建第二个设备插件pod。考虑这是一个设备的“新版本”。

# https://gist.github.com/bobbypage/18d526daf4c1b72bdf659e65ed3890d2
$ kubectl apply -f second-device-plugin.yaml
pod/second-device-plugin created

验证pod已运行。

$ kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=kind-worker
NAMESPACE     NAME                   READY   STATUS    RESTARTS   AGE   IP            NODE          NOMINATED NODE   READINESS GATES
default       first-device-plugin    1/1     Running   0          42s   10.244.1.2    kind-worker   <none>           <none>
default       second-device-plugin   1/1     Running   0          11s   10.244.1.3    kind-worker   <none>           <none>
kube-system   kindnet-42ww9          1/1     Running   0          73s   192.168.8.3   kind-worker   <none>           <none>
kube-system   kube-proxy-k7pgh       1/1     Running   0          73s   192.168.8.3   kind-worker   <none>           <none>

example.com/resource仍然被广告为2。

$ kubectl get node kind-worker -o json | jq '.status| {allocatable: .allocatable, capacity: .capacity}'
{
  "allocatable": {
    "cpu": "12",
    "ephemeral-storage": "421481144Ki",
    "example.com/resource": "2",
    "hugepages-1Gi": "0",
    "hugepages-2Mi": "0",
    "memory": "65425280Ki",
    "pods": "110"
  },
  "capacity": {
    "cpu": "12",
    "ephemeral-storage": "421481144Ki",
    "example.com/resource": "2",
    "hugepages-1Gi": "0",
    "hugepages-2Mi": "0",
    "memory": "65425280Ki",
    "pods": "110"
  }
}

现在,删除第一个设备插件

$ kubectl delete pod first-device-plugin
pod "first-device-plugin" deleted

example.com/resource的可分配空间降至零,将无法恢复!

$ kubectl get node kind-worker -o json | jq '.status| {allocatable: .allocatable, capacity: .capacity}'
{
  "allocatable": {
    "cpu": "12",
    "ephemeral-storage": "421481144Ki",
    "example.com/resource": "0",
    "hugepages-1Gi": "0",
    "hugepages-2Mi": "0",
    "memory": "65425280Ki",
    "pods": "110"
  },
  "capacity": {
    "cpu": "12",
    "ephemeral-storage": "421481144Ki",
    "example.com/resource": "2",
    "hugepages-1Gi": "0",
    "hugepages-2Mi": "0",
    "memory": "65425280Ki",
    "pods": "110"
  }
}

第二个设备插件pod仍在运行,但资源不再被广告。

$ kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=kind-worker
NAMESPACE     NAME                   READY   STATUS    RESTARTS   AGE    IP            NODE          NOMINATED NODE   READINESS GATES
default       second-device-plugin   1/1     Running   0          63s    10.244.1.3    kind-worker   <none>           <none>
kube-system   kindnet-42ww9          1/1     Running   0          2m5s   192.168.8.3   kind-worker   <none>           <none>
kube-system   kube-proxy-k7pgh       1/1     Running   0          2m5s   192.168.8.3   kind-worker   <none>           <none>

Kubelet日志 - https://gist.github.com/bobbypage/04424608bf89bb1a7043cf50dadf28e5
Containerd日志 - https://gist.github.com/bobbypage/351c60f690687067a631a6fcb7ba0d77

gr8qqesn

gr8qqesn3#

我的2C:升级流程(及其中断)似乎是一个合法且重要的用例,所以让我们来处理这个问题。

pod7payv

pod7payv4#

感谢@ffromani!我认为我们有两个解决方案可以继续进行:

  1. 调整注册流程,防止插件在已经存在一个插件声明相同资源名称的情况下注册。为此所做的更改实际上非常小,我有一个用于此的小型PoC补丁:
+++ b/pkg/kubelet/cm/devicemanager/plugin/v1beta1/handler.go
@@ -65,6 +65,11 @@ func (s *server) ValidatePlugin(pluginName string, endpoint string, versions []s
 }

 func (s *server) connectClient(name string, socketPath string) error {
+       existingClient := s.getClient(name)
+       if existingClient != nil {
+               return fmt.Errorf("failed to connect to new client because existing client is already connected. The existing client should disconnect first. resource: %v, socketPath: %v", name, socketPath)
+       }
+
        c := NewPluginClient(name, socketPath, s.chandler)

        s.registerClient(name, c)

这很简单,但可能被认为是稍微调整一下行为,因为旧设备插件需要提前停止。
2. 另一种方法是,当我们检测到设备插件正在注册时,如果已经存在一个具有相同设备名称的设备插件,我们将断开与现有插件的连接并连接到新插件。我有一个WIP补丁用于此 - bobbypage@343f839
这稍微复杂一些,但遵循了1.25之前的版本行为。我认为我更倾向于选择第二个选项,但很高兴听到你的想法。
cc @elezar 分享你的想法

cidc1ykv

cidc1ykv5#

感谢@bobbypage。我需要仔细审查git历史,我想了解是否将更改回溯到过去是有意为之还是重构过程中的意外副作用。我也倾向于选择你的第二个选项。我会尽快跟进。

相关问题