My GKE cluster is displaying "Scale down blocked by pod" note, and clicking it then going to the Logs Explorer it shows a filtered view with log entries for the pods that had the incident: no.scale.down.node.pod.not.enough.pdb
. But that's really strange since the pods on the log entries having that message do have PDB defined for them. So it seems to me that GKE is wrongly reporting the cause of the blocking of the node scale down. These are the manifests for one of the pods with this issue:
apiVersion: v1
kind: Service
metadata:
labels:
app: ms-new-api-beta
name: ms-new-api-beta
namespace: beta
spec:
ports:
- port: 8000
protocol: TCP
targetPort: 8000
selector:
app: ms-new-api-beta
type: NodePort
The Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: ms-new-api-beta
name: ms-new-api-beta
namespace: beta
spec:
selector:
matchLabels:
app: ms-new-api-beta
template:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'
labels:
app: ms-new-api-beta
spec:
containers:
- command:
- /deploy/venv/bin/gunicorn
- '--bind'
- '0.0.0.0:8000'
- 'newapi.app:app'
- '--chdir'
- /deploy/app
- '--timeout'
- '7200'
- '--workers'
- '1'
- '--worker-class'
- uvicorn.workers.UvicornWorker
- '--log-level'
- DEBUG
env:
- name: ENV
value: BETA
image: >-
gcr.io/.../api:${trigger['tag']}
imagePullPolicy: Always
livenessProbe:
failureThreshold: 5
httpGet:
path: /rest
port: 8000
scheme: HTTP
initialDelaySeconds: 120
periodSeconds: 20
timeoutSeconds: 30
name: ms-new-api-beta
ports:
- containerPort: 8000
name: http
protocol: TCP
readinessProbe:
httpGet:
path: /rest
port: 8000
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 2
resources:
limits:
cpu: 150m
requests:
cpu: 100m
startupProbe:
failureThreshold: 30
httpGet:
path: /rest
port: 8000
periodSeconds: 120
imagePullSecrets:
- name: gcp-docker-registry
The Horizontal Pod Autoscaler:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: ms-new-api-beta
namespace: beta
spec:
maxReplicas: 5
minReplicas: 2
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ms-new-api-beta
targetCPUUtilizationPercentage: 100
And finally, the Pod Disruption Budget:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: ms-new-api-beta
namespace: beta
spec:
minAvailable: 0
selector:
matchLabels:
app: ms-new-api-beta
1条答案
按热度按时间jyztefdp1#
no.scale.down.node.pod.not.enough.pdb
is not complaining about the lack of a PDB. It is complaining that, if the pod is scaled down, it will be in violation of the existing PDB(s).The "budget" is how much disruption the Pod can permit. The platform will not take any intentional action which violates that budget.
There may be another PDB in place that would be violated. To check, make sure to review pdbs in the pod's namespace: