kubernetes 即使使用现有PDB,GKE也会记录“没有向下扩展节点pod,没有足够的pdb”

ldfqzlk8  于 2022-12-11  发布在  Kubernetes
关注(0)|答案(1)|浏览(128)

My GKE cluster is displaying "Scale down blocked by pod" note, and clicking it then going to the Logs Explorer it shows a filtered view with log entries for the pods that had the incident: no.scale.down.node.pod.not.enough.pdb . But that's really strange since the pods on the log entries having that message do have PDB defined for them. So it seems to me that GKE is wrongly reporting the cause of the blocking of the node scale down. These are the manifests for one of the pods with this issue:

apiVersion: v1
kind: Service
metadata:
  labels:
    app: ms-new-api-beta
  name: ms-new-api-beta
  namespace: beta
spec:
  ports:
    - port: 8000
      protocol: TCP
      targetPort: 8000
  selector:
    app: ms-new-api-beta
  type: NodePort

The Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: ms-new-api-beta
  name: ms-new-api-beta
  namespace: beta
spec:
  selector:
    matchLabels:
      app: ms-new-api-beta
  template:
    metadata:
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'
      labels:
        app: ms-new-api-beta
    spec:
      containers:
        - command:
            - /deploy/venv/bin/gunicorn
            - '--bind'
            - '0.0.0.0:8000'
            - 'newapi.app:app'
            - '--chdir'
            - /deploy/app
            - '--timeout'
            - '7200'
            - '--workers'
            - '1'
            - '--worker-class'
            - uvicorn.workers.UvicornWorker
            - '--log-level'
            - DEBUG
          env:
            - name: ENV
              value: BETA
            
          image: >-
            gcr.io/.../api:${trigger['tag']}
          imagePullPolicy: Always
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /rest
              port: 8000
              scheme: HTTP
            initialDelaySeconds: 120
            periodSeconds: 20
            timeoutSeconds: 30
          name: ms-new-api-beta
          ports:
            - containerPort: 8000
              name: http
              protocol: TCP
          readinessProbe:
            httpGet:
              path: /rest
              port: 8000
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 2
          resources:
            limits:
              cpu: 150m
            requests:
              cpu: 100m
          startupProbe:
            failureThreshold: 30
            httpGet:
              path: /rest
              port: 8000
            periodSeconds: 120
      imagePullSecrets:
        - name: gcp-docker-registry

The Horizontal Pod Autoscaler:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: ms-new-api-beta
  namespace: beta
spec:
  maxReplicas: 5
  minReplicas: 2
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ms-new-api-beta
  targetCPUUtilizationPercentage: 100

And finally, the Pod Disruption Budget:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: ms-new-api-beta
  namespace: beta
spec:
  minAvailable: 0
  selector:
    matchLabels:
      app: ms-new-api-beta
jyztefdp

jyztefdp1#

no.scale.down.node.pod.not.enough.pdb is not complaining about the lack of a PDB. It is complaining that, if the pod is scaled down, it will be in violation of the existing PDB(s).
The "budget" is how much disruption the Pod can permit. The platform will not take any intentional action which violates that budget.
There may be another PDB in place that would be violated. To check, make sure to review pdbs in the pod's namespace:

kubectl get pdb

相关问题