1.我一直在尝试在Azure Kubernetes Service(AKS)版本1.19.11上使用Azure Spot示例,并在这些节点上启用Pod调度,我正在尝试使用PodTolerationRestriction
准入控制器。
1.我可以确认PodTolerationRestriction控制器已启用,因为我在将复制集部署到默认名称空间时没有遇到任何问题。这是另一个名称空间,但我们在创建它时没有专门添加任何容差。
1.我从其他地方收集到,沿着针对特定污点(在我的案例中是spot)的白名单之外,还需要将某些默认容忍列入白名单。因此,我在名称空间中添加了某些注解。
1.我没有为这个状态集合预先定义任何额外的容差。
- 节点有污点-前两个污点通过helm chart值处理
- RabbitMQ=true:NoSchedule
- Allow=true:NoExecute
- kubernetes.azure.com/scalesetpriority=spot:NoSchedule
我想知道哪些额外的公差需要列入白名单。
我添加的注解-
scheduler.alpha.kubernetes.io/defaultTolerations: '[{"operator": "Equal", "value": "spot", "key": "kubernetes.azure.com/scalesetpriority"}]'
scheduler.alpha.kubernetes.io/tolerationsWhitelist: '[{"operator": "Equal", "value": "spot", "key": "kubernetes.azure.com/scalesetpriority"}, {"operator": "Exists", "effect": "NoSchedule", "key": "node.kubernetes.io/memory-pressure"}, {"operator": "Exists", "tolerationSeconds": 300, "effect": "NoExecute", "key": "node.kubernetes.io/unreachable"}, {"operator": "Exists", "tolerationSeconds": 300, "effect": "NoExecute", "key": "node.kubernetes.io/not-ready"}]'
Statefulset描述-
Name: <release name>
Namespace: <namespace>
CreationTimestamp: Tue, 18 Jan 2022 19:37:38 +0530
Selector: app.kubernetes.io/instance=<name>,app.kubernetes.io/name=rabbitmq
Labels: app.kubernetes.io/instance=rabbit
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=rabbitmq
helm.sh/chart=rabbitmq-8.6.1
Annotations: meta.helm.sh/release-name: <release name>
meta.helm.sh/release-namespace: <namespace>
Replicas: 3 desired | 0 total
Update Strategy: RollingUpdate
Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app.kubernetes.io/instance=rabbit
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=rabbitmq
helm.sh/chart=rabbitmq-8.6.1
Annotations: checksum/config: 1a138ded5a3ade049cbee9f4f8e2d0fd7253c126d49b790495a492601fd9f280
checksum/secret: 05af38634eb4b46c2f8db5770013e1368e78b0d5af057aed5fa4fe7eec4c92de
prometheus.io/port: 9419
prometheus.io/scrape: true
Service Account: sa-rabbitmq
Containers:
rabbitmq:
Image: docker.io/bitnami/rabbitmq:3.8.9-debian-10-r64
Ports: 5672/TCP, 25672/TCP, 15672/TCP, 4369/TCP, 9419/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
Liveness: exec [/bin/bash -ec rabbitmq-diagnostics -q ping] delay=120s timeout=200s period=30s #success=1 #failure=6
Readiness: exec [/bin/bash -ec rabbitmq-diagnostics -q check_running && rabbitmq-diagnostics -q check_local_alarms] delay=10s timeout=200s period=30s #success=1 #failure=3
Environment:
<multiple environment variables>
Mounts:
/bitnami/rabbitmq/conf from configuration (rw)
/bitnami/rabbitmq/mnesia from data (rw)
Volumes:
configuration:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rabbit-rabbitmq-config
Optional: false
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
Volume Claims: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 31s (x14 over 72s) statefulset-controller create Pod <pod-name> in StatefulSet <release name> failed error: pod tolerations (possibly merged with namespace default tolerations) conflict with its namespace whitelist
1条答案
按热度按时间x759pob21#
我也有同样的问题。修复了它:
1.从ns中删除了白名单注解。
1.部署了逃生舱
kubectl get pod <pod name> -o yaml
在我的情况下,我有一对夫妇额外的宽容注射没有我意识到。