发生了什么?
我正在运行一个AI应用程序,希望通过使用拓扑管理器在CPU和GPU之间建立NUMA亲和性绑定来实现最高性能。
所以我使用这样的选项重启kubelet:
kubeReserved:
cpu: "1"
memory: "2Gi"
topologyManagerPolicy: restricted
topologyManagerPolicyOptions:
prefer-closest-numa-nodes: "true"
cpuManagerPolicy: static
cpuManagerPolicyOptions:
align-by-socket: "true"
distribute-cpus-across-numa: "true"
featureGates:
CPUManagerPolicyAlphaOptions: true
TopologyManagerPolicyOptions: true
TopologyManagerPolicyAlphaOptions: true
然后我使用多张GPU卡,
如果GPU卡的数量大于一个插槽,分配的CPU和GPU不按插槽对齐。
你期望发生什么?
pod请求的CPU核心应该是GPU所在NUMA节点的位置。
我们如何尽可能精确地重现它?
创建带有多于一张nvidia GPU卡的pod,例如两张GPU卡
- 创建带有5张nvidia GPU卡的pod
每个NUMA有一个GPU卡,一个插槽由4个NUMA节点组成,所以我请求5张卡片,因此GPU必须跨越插槽。
resources:
limits:
cpu: "40"
memory: 4Gi
nvidia.com/gpu: 5
- 等待pod运行
# kubectl get pod
NAME READY STATUS RESTARTS AGE
guarantee-gpu5-40c-5c665ff85c-f67vr 1/1 Running 0 8m48s
- 检查GPU和CPU核心NUMA拓扑
{
"policyName": "static",
"defaultCpuSet": "0-7,48-63",
"entries": {
"8a77ef80-6ff1-4ca5-82a8-f45c2d82ae87": {
"nginx-gpu5-40c": "8-47"
}
},
"checksum": 2742015222
}
{
"Data": {
"PodDeviceEntries": [
{
"PodUID": "8a77ef80-6ff1-4ca5-82a8-f45c2d82ae87",
"ContainerName": "nginx-gpu5-40c",
"ResourceName": "nvidia.com/gpu",
"DeviceIDs": {
"1": [
"GPU-92d67a0e-9b17-c7fb-45c0-f9232a956872"
],
"4": [
"GPU-c29f034e-0614-387a-225c-1367a2a67f34"
],
"5": [
"GPU-7c4bf20f-e22d-6221-223d-c15655a5f703"
],
"6": [
"GPU-480ed0ed-c229-844f-7206-837c483eaa27"
],
"7": [
"GPU-02d33038-442c-5562-ef6a-bfb3ca27fb2d"
]
},
"AllocResp": "CucBChZOVklESUFfVklTSUJMRV9ERVZJQ0VTEswBR1BVLTAyZDMzMDM4LTQ0MmMtNTU2Mi1lZjZhLWJmYjNjYTI3ZmIyZCxHUFUtNDgwZWQwZWQtYzIyOS04NDRmLTcyMDYtODM3YzQ4M2VhYTI3LEdQVS03YzRiZjIwZi1lMjJkLTYyMjEtMjIzZC1jMTU2NTVhNWY3MDMsR1BVLWMyOWYwMzRlLTA2MTQtMzg3YS0yMjVjLTEzNjdhMmE2N2YzNCxHUFUtOTJkNjdhMGUtOWIxNy1jN2ZiLTQ1YzAtZjkyMzJhOTU2ODcy"
}
],
"RegisteredDevices": {
"nvidia.com/gpu": [
"GPU-480ed0ed-c229-844f-7206-837c483eaa27",
"GPU-7c4bf20f-e22d-6221-223d-c15655a5f703",
"GPU-c29f034e-0614-387a-225c-1367a2a67f34",
"GPU-903d67d4-ef71-0285-c620-d8cd7dd4b99a",
"GPU-e2a16c50-b350-94c2-4acf-735f38745c20",
"GPU-92d67a0e-9b17-c7fb-45c0-f9232a956872",
"GPU-ba72128f-59d8-4b2f-7968-c0b0029dd19f",
"GPU-02d33038-442c-5562-ef6a-bfb3ca27fb2d"
]
}
},
"Checksum": 2746309433
}
# lscpu | grep 'NUMA node'
NUMA node(s): 8
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
NUMA node2 CPU(s): 16-23
NUMA node3 CPU(s): 24-31
NUMA node4 CPU(s): 32-39
NUMA node5 CPU(s): 40-47
NUMA node6 CPU(s): 48-55
NUMA node7 CPU(s): 56-63
# lscpu | grep -i Socket
Core(s) per socket: 32
Socket(s): 2
# nvidia-smi topo -m
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X SYS SYS SYS SYS SYS SYS SYS 24-31 3 N/A
GPU1 SYS X SYS SYS SYS SYS SYS SYS 16-23 2 N/A
GPU2 SYS SYS X SYS SYS SYS SYS SYS 8-15 1 N/A
GPU3 SYS SYS SYS X SYS SYS SYS SYS 0-7 0 N/A
GPU4 SYS SYS SYS SYS X SYS SYS SYS 56-63 7 N/A
GPU5 SYS SYS SYS SYS SYS X SYS SYS 48-55 6 N/A
GPU6 SYS SYS SYS SYS SYS SYS X SYS 40-47 5 N/A
GPU7 SYS SYS SYS SYS SYS SYS SYS X 32-39 4 N/A
- 我们可以看到分配的GPU位于NUMA节点1、4、5、6、7上,而CPU位于NUMA节点1、2、3、4、5上,它们没有对齐。
我们需要了解其他信息吗?
- 无响应*
Kubernetes版本
# kubectl version
Client Version: v1.28.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.1
云提供商
OS版本
# On Linux:
# cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
# uname -a
Linux k8s-work1 5.4.0-148-generic #165-Ubuntu SMP Tue Apr 18 08:53:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
8条答案
按热度按时间yftpprvb1#
/sig node
tzxcd3kk2#
@ffromani
du7egjpx3#
/assign @swatisehgal
please help triage this
omvjsjqw4#
/assign @swatisehgal
please help triage this
this is a known drawbacks in the KEP drawbacks
according to my test, the cpu allocation is seldom aligned by numa with deviceManager, there should NUMA nodes that are perfectly aligned with these two resources apparently on my environment.
i have submitted a PR #122669 , please review.
zwghvu4y5#
感谢您提供的额外信息,从当前的问题描述中,我们无法明显看出您的环境中已经启用了
align-by-socket
CPU管理器策略选项。请在您的问题中也提供这方面的信息。我看到您已经提供了NUMA上的CPU分布情况,了解CPU在插槽上的分布情况也会有所帮助。您的环境中每个NUMA节点是否都有多个插槽?
align-by-socket
仍然是cpu管理器中的一个alpha功能,主要是为了处理每个插槽上有多于一个NUMA节点的情况,以确保在插槽边界处的对齐。虽然可能还有改进的空间,但我认为我们需要更多的信息来更好地理解这个问题。rm5edbpk6#
/triage accepted
ni65a41a7#
感谢您提供的额外信息,从当前的问题描述中,我们无法明显看出您的环境中已经启用了
align-by-socket
CPU管理器策略选项。请在问题中也提供这方面的信息。我看到您已经提供了NUMA上的CPU分布情况,了解CPU在插槽上的分布情况也会有所帮助。您的环境中每个NUMA节点是否都有多个插槽?
align-by-socket
仍然是cpu管理器中的一个alpha功能,主要是为了处理每个插槽上都有多个NUMA节点的情况,并确保在插槽边界处对齐。虽然可能还有改进的空间,但我认为我们需要更多的信息来更好地理解这个问题。抱歉回复晚了,我已经更新了问题,请查看。
rekjcdws8#
以下是文本内容的翻译结果:
这里是日志,提示提供者给出的最佳提示是numa1、4、5、6、7,而cpuManager的cpuset是8-47,其NUMA为1、2、3、4、5。