发生了什么?
如果kubelet配置为cgroupRoot
和cpuManagerPolicy: static
,并且cpuset cgroup定义了特定的vCPU范围,那么kubelet将无法启动containerd任务或更新容器资源:
E0422 11:37:18.746817 109321 remote_runtime.go:343] "StartContainer from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to apply cgroup configuration: failed to write \"0-39\": write /sys/fs/cgroup/cpuset/pods.slice/pods-kubepods.slice/pods-kubepods-burstable.slice/pods-kubepods-burstable-pode444cc90_8458_4d84_8319_a443fe6e975a.slice/cri-containerd-8e99221bf7eb3049f7afa35b3719f7870088c5191508f2bdf47959fe2a677385.scope/cpuset.cpus: permission denied: unknown" containerID="8e99221bf7eb3049f7afa35b3719f7870088c5191508f2bdf47959fe2a677385"
你期望发生什么?
CPU管理器尊重根cgroup的cpuset,并将其值用作defaultCpuSet
。
我们如何尽可能最小精确地重现它?
使用以下/var/lib/kubelet/config.yaml
:
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- fd00:10:245::a
clusterDomain: cluster.local
containerRuntimeEndpoint: ""
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMaximumGCAge: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
logging:
flushFrequency: 0
options:
json:
infoBufferSize: "0"
verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s
cpuManagerPolicy: static
reservedSystemCPUs: 0-1,20-21
cgroupRoot: /pods
创建带有cpuset的cgroup:
for DIR in hugetlb cpuset cpu,cpuacct memory systemd pids; do /bin/mkdir -p /sys/fs/cgroup/$DIR/pods.slice; done
echo 0-1 > /sys/fs/cgroup/cpuset/pods.slice/cpuset.mems
echo 0-1,6-39 > /sys/fs/cgroup/cpuset/pods.slice/cpuset.cpus
重启kubelet:
systemctl stop kubelet
rm /var/lib/kubelet/cpu_manager_state
systemctl start kubelet
cat /var/lib/kubelet/cpu_manager_state
{"policyName":"static","defaultCpuSet":"0-39","checksum":421241391}
检查日志:
journalctl -u kubelet -f
我们需要了解其他信息吗?
- 无响应*
Kubernetes版本
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T12:20:54Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"29", GitVersion:"v1.29.1", GitCommit:"bc401b91f2782410b3fb3f9acf43a995c4de90d2", GitTreeState:"clean", BuildDate:"2024-01-17T15:41:12Z", GoVersion:"go1.21.6", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.27) and server (1.29) exceeds the supported minor version skew of +/-1
云提供商
操作系统版本
- 无响应*
安装工具
- 无响应*
容器运行时(CRI)和版本(如适用)containerd --version containerd github.com/containerd/containerd 1.7.2
7条答案
按热度按时间46scxncf1#
/sig node
6qftjkof2#
/cc
nfzehxib3#
/cc
zpf6vheq4#
@t33m,为什么要定义cpuset?CPU分配应该如何进行?对于评估这个bug的优先级有多重要?
ckx4rj1h5#
/triage accepted
5anewei66#
@AnishShah,为了限制pods的工作负载使用的内核数量。例如,我有一些自己的服务,它们使用systemd启动,并且我可以通过在
/etc/systemd/system.conf
文件中使用CPUAffinity选项为它们定义CPU集。但是我也想确保,相同的内核永远不会被pods使用。q35jwt9p7#
相关:#118021 和 #123979