kubernetes 在attachDetachController中，asw的状态与实际节点状态不一致,

balp4ylt 于 6个月前发布在 Kubernetes

关注(0)|答案(6)|浏览(75)

发生了什么？
我们正在运行一个使用调度器扩展器的Kubernetes(k8s)集群，该集群将StatefulSet中的pod调度到固定的节点上。然后，这些pod会挂载一些卷。我们经常删除和重启pod,每周执行数千次此操作。然而，每隔几周，我们就会遇到一个问题，即在删除pod并将其调度回原始节点后，关联的卷无法挂载。因此，这导致pod进入永久的ContainerCreating状态。

# kubectl describe pod x
  Warning  FailedMount  105s (x30 over 67m)  kubelet, h34b07425.na61  Unable to mount volumes for pod "*": timeout expired waiting for volumes to attach or mount for pod "*". list of unmounted volumes=[*]. list of unattached volumes=[*]

我们追踪到问题出在attach/detach(ad)控制器上。似乎节点卷没有成功挂载，但ad控制器内部的实际状态缓存(asw)错误地认为已经挂载。这种差异阻止了pod继续进行挂载操作。
在查看kube-controller-manager日志后，我们注意到以下内容：

I1211 00:42:07.780851       1 node_status_updater.go:106] Updating status "{\"status\":{\"volumesAttached\":[{\"devicePath\":\"\",\"name\":\"kubernetes.io/csi/csi-hostpath^6583bc90-0677-11ee-a5d3-506b4b2b7da6\"}]}}" for node "h34b07425.na61" succeeded. VolumesAttached: [{kubernetes.io/csi/csi-hostpath^6583bc90-0677-11ee-a5d3-506b4b2b7da6 }]
...
I1211 00:42:08.261836       1 node_status_updater.go:106] Updating status "{}" for node "h34b07425.na61" succeeded. VolumesAttached: [{kubernetes.io/csi/csi-hostpath^6583bc90-0677-11ee-a5d3-506b4b2b7da6 } {kubernetes.io/csi/csi-hostpath^3d7384a6-068e-11ee-a5d3-506b4b2b7da6 }]

问题似乎源于在调用DetachVolume之后，informer从API服务器同步节点信息所需的延迟，导致informer无法准确反映节点的最新状态。它错误地认为卷仍然连接到节点。

// pkg/controller/volume/attachdetach/reconciler/reconciler.go
func (rc *reconciler) reconcile(ctx context.Context) {
		err = rc.nodeStatusUpdater.UpdateNodeStatusForNode(logger, attachedVolume.NodeName)
		...
		err = rc.attacherDetacher.DetachVolume(logger, attachedVolume.AttachedVolume, verifySafeToDetach, rc.actualStateOfWorld)
	...
	rc.attachDesiredVolumes(logger)

	// Update Node Status
	err := rc.nodeStatusUpdater.UpdateNodeStatuses(logger)
	...
}

在pod调度回原始节点并完成AttachVolume操作后，nodeStatusUpdater.updateNodeStatus()函数返回nil而不重试，无论是否有任何更新。

// pkg/controller/volume/attachdetach/statusupdater/node_status_updater.go
func (nsu *nodeStatusUpdater) processNodeVolumes(logger klog.Logger, nodeName types.NodeName, attachedVolumes []v1.AttachedVolume) error {
	nodeObj, err := nsu.nodeLister.Get(string(nodeName))
	...
	err = nsu.updateNodeStatus(logger, nodeName, nodeObj, attachedVolumes)
	...
}

func (nsu *nodeStatusUpdater) updateNodeStatus(logger klog.Logger, nodeName types.NodeName, nodeObj *v1.Node, attachedVolumes []v1.AttachedVolume) error {
	node := nodeObj.DeepCopy()
	node.Status.VolumesAttached = attachedVolumes
	_, patchBytes, err := nodeutil.PatchNodeStatus(nsu.kubeClient.CoreV1(), nodeName, nodeObj, node)
	if err != nil {
		return err
	}
	logger.V(4).Info("Updating status for node succeeded", "node", klog.KObj(node), "patchBytes", patchBytes, "attachedVolumes", attachedVolumes)
	return nil
}

因此，asw被informer误导，认为应连接到节点的所有卷都已成功连接。然而，通过检查kubectl的输出，很明显status.volumesAttached不包括所述卷。

TL; DR

Below is the steps where the problem occurred, with the timeline proceeding from top to bottom.
1. kubectl delete pod
2. ad-controller watch pod deleted, call dsw.Deletepod()
3. ad-controller reconciler() every 100ms period: 
		3.1 asw.RemoveVolumeFromReportAsAttached()
		3.2 nodeStatusUpdater.UpdateNodeStatusForNode()  // remove volume from node status.volumesAttached
		3.3 attacherDetacher.DetachVolume()
4. kube-controller-manager recreate the pod, and the pod is scheduled back to the original node
5. ad-controller watch pod created, call dsw.AddPod()
		3.4 attacherDetacher.AttachVolume()
		3.5 nodeStatusUpdater.UpdateNodeStatuses() // nsu.nodeLister.get() get a cached node, with the volume has not been removed by step-3.2. Since the pod is scheduled to the original node, the volumesAttached of the node is the same as before(step-3.2), thus the update operation will do nothing.

预期会发生什么？

如果nodeStatusUpdater.updateNodeStatus()通过与informer的节点进行比较而未修补任何内容，则应重试。

我们如何尽可能精确地重现它？

很难重现。但您可以通过以下步骤来实现：

使用scheduler-extender将pod调度到固定的节点上
手动增加apiserver负载(让informer同步延迟)
多次删除具有大量statefulset pods和卷的大量mounts

我们需要了解其他任何信息吗？

我们的Kubernetes版本是v1.13.3,但问题仍然存在于最新版本中

Kubernetes版本

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-01T20:08:12Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"archive", BuildDate:"2021-05-10T07:53:22Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}

云提供商

N/A

OS版本

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

安装工具

容器运行时(CRI)和版本(如适用)

6条答案

按热度按时间

zour9fqk1#

这个问题目前正在等待分类。
如果SIG或子项目确定这是一个相关的问题，他们将通过应用triage/accepted标签并提供进一步的指导来接受它。
组织成员可以通过在评论中写入/triage accepted来添加triage/accepted标签。
有关使用PR评论与我互动的说明，请查看here。如果您对我的行为有任何问题或建议，请针对kubernetes/test-infra仓库提出一个问题。