kubernetes 用于气流的EKS Fargate pod持续重启,并出现错误代码

wd2eg0qa  于 2022-11-02  发布在  Kubernetes
关注(0)|答案(2)|浏览(178)

我正在尝试使用Helm在EKS Fargate上部署AIrflow。我已经设置好了EKS群集、SC、PV和PVC,沿着命名空间和fargate-profile(dev)。
我的问题出现在我安装 Helm 时:

helm upgrade --install airflow apache-airflow/airflow -n dev --values values.yaml --set volumePermissions.enbled=true --debug

[![窗格列表][1]][1]
上面是豆荚的列表。最后3个继续进入崩溃循环。
以下是Webserver pod的描述:

C:\Users\tanma>kubectl describe pods -n dev airflow-webserver-775d548b98-wd5x8
Name:                 airflow-webserver-775d548b98-wd5x8
Namespace:            dev
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      airflow-webserver
Node:                 fargate-ip-192-168-161-147.us-west-2.compute.internal/192.168.161.147
Start Time:           Thu, 13 Oct 2022 17:12:54 -0400
Labels:               component=webserver
                      eks.amazonaws.com/fargate-profile=dev
                      pod-template-hash=775d548b98
                      release=airflow
                      tier=airflow
Annotations:          CapacityProvisioned: 0.25vCPU 0.5GB
                      Logging: LoggingDisabled: LOGGING_CONFIGMAP_NOT_FOUND
                      checksum/airflow-config: 978d20ff42d3de620bee24f2e35b1769f20ebd948890bf474bd940624e39f150
                      checksum/extra-configmaps: 2e44e493035e2f6a255d08f8104087ff10d30aef6f63176f1b18f75f73295598
                      checksum/extra-secrets: bb91ef06ddc31c0c5a29973832163d8b0b597812a793ef911d33b622bc9d1655
                      checksum/metadata-secret: d9bd679df96f2631a8559d02cc528fd78c3d73c06289be9816d83fb332e05b5e
                      checksum/pgbouncer-config-secret: da52bd1edfe820f0ddfacdebb20a4cc6407d296ee45bcb500a6407e2261a5ba2
                      checksum/webserver-config: 4a2281a4e3ed0cc5e89f07aba3c1bb314ea51c17cb5d2b41e9b045054a6b5c72
                      checksum/webserver-secret-key: a1e18ebcc73a51b6bafe52d95eee84dcdf132559cac0248fff6e58e409b4505e
                      kubernetes.io/psp: eks.privileged
Status:               Running
IP:                   192.168.161.147
IPs:
  IP:           192.168.161.147
Controlled By:  ReplicaSet/airflow-webserver-775d548b98
Init Containers:
  wait-for-airflow-migrations:
    Container ID:  containerd://bf4919f7a268bbeaf1a8f8779e4da1551d76f622d9ce970f18a3f2a1f14c24d7
    Image:         apache/airflow:2.4.1
    Image ID:      docker.io/apache/airflow@sha256:e077b68d81d56d773bddbcdc8941b7a2c16a2087a641005dfc5f1b8dcadec90a
    Port:          <none>
    Host Port:     <none>
    Args:
      airflow
      db
      check-migrations
      --migration-wait-timeout=60
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 13 Oct 2022 17:14:40 -0400
      Finished:     Thu, 13 Oct 2022 17:15:12 -0400
    Ready:          True
    Restart Count:  0
    Environment:
      AIRFLOW__CORE__FERNET_KEY:            <set to the key 'fernet-key' in secret 'airflow-fernet-key'>                      Optional: false
      AIRFLOW__CORE__SQL_ALCHEMY_CONN:      <set to the key 'connection' in secret 'airflow-airflow-metadata'>                Optional: false
      AIRFLOW__DATABASE__SQL_ALCHEMY_CONN:  <set to the key 'connection' in secret 'airflow-airflow-metadata'>                Optional: false
      AIRFLOW_CONN_AIRFLOW_DB:              <set to the key 'connection' in secret 'airflow-airflow-metadata'>                Optional: false
      AIRFLOW__WEBSERVER__SECRET_KEY:       <set to the key 'webserver-secret-key' in secret 'airflow-webserver-secret-key'>  Optional: false
    Mounts:
      /opt/airflow/airflow.cfg from config (ro,path="airflow.cfg")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pntv6 (ro)
Containers:
  webserver:
    Container ID:  containerd://e479b50af8eefc8c99971cc9cc9b6345f826c09d5f770276b33518340298359d
    Image:         apache/airflow:2.4.1
    Image ID:      docker.io/apache/airflow@sha256:e077b68d81d56d773bddbcdc8941b7a2c16a2087a641005dfc5f1b8dcadec90a
    Port:          8080/TCP
    Host Port:     0/TCP
    Args:
      bash
      -c
      exec airflow webserver
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    143
      Started:      Thu, 13 Oct 2022 17:40:25 -0400
      Finished:     Thu, 13 Oct 2022 17:42:19 -0400
    Ready:          False
    Restart Count:  9
    Liveness:       http-get http://:8080/health delay=15s timeout=30s period=5s #success=1 #failure=20
    Readiness:      http-get http://:8080/health delay=15s timeout=30s period=5s #success=1 #failure=20
    Environment:
      AIRFLOW__CORE__FERNET_KEY:            <set to the key 'fernet-key' in secret 'airflow-fernet-key'>                      Optional: false
      AIRFLOW__CORE__SQL_ALCHEMY_CONN:      <set to the key 'connection' in secret 'airflow-airflow-metadata'>                Optional: false
      AIRFLOW__DATABASE__SQL_ALCHEMY_CONN:  <set to the key 'connection' in secret 'airflow-airflow-metadata'>                Optional: false
      AIRFLOW_CONN_AIRFLOW_DB:              <set to the key 'connection' in secret 'airflow-airflow-metadata'>                Optional: false
      AIRFLOW__WEBSERVER__SECRET_KEY:       <set to the key 'webserver-secret-key' in secret 'airflow-webserver-secret-key'>  Optional: false
    Mounts:
      /opt/airflow/airflow.cfg from config (ro,path="airflow.cfg")
      /opt/airflow/config/airflow_local_settings.py from config (ro,path="airflow_local_settings.py")
      /opt/airflow/logs from logs (rw)
      /opt/airflow/pod_templates/pod_template_file.yaml from config (ro,path="pod_template_file.yaml")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pntv6 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      airflow-airflow-config
    Optional:  false
  logs:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  af-efs-fargate-1
    ReadOnly:   false
  kube-api-access-pntv6:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason           Age                  From               Message
  ----     ------           ----                 ----               -------
  Warning  LoggingDisabled  31m                  fargate-scheduler  Disabled logging because aws-logging configmap was not found. configmap "aws-logging" not found
  Normal   Scheduled        30m                  fargate-scheduler  Successfully assigned dev/airflow-webserver-775d548b98-wd5x8 to fargate-ip-192-168-161-147.us-west-2.compute.internal
  Normal   Pulling          30m                  kubelet            Pulling image "apache/airflow:2.4.1"
  Normal   Pulled           28m                  kubelet            Successfully pulled image "apache/airflow:2.4.1" in 1m43.155801441s
  Normal   Created          28m                  kubelet            Created container wait-for-airflow-migrations
  Normal   Started          28m                  kubelet            Started container wait-for-airflow-migrations
  Normal   Pulled           28m                  kubelet            Container image "apache/airflow:2.4.1" already present on machine
  Normal   Created          28m                  kubelet            Created container webserver
  Normal   Started          28m                  kubelet            Started container webserver
  Warning  Unhealthy        27m (x9 over 27m)    kubelet            Readiness probe failed: Get "http://192.168.161.147:8080/health": dial tcp 192.168.161.147:8080: connect: connection refused
  Warning  Unhealthy        10m (x156 over 27m)  kubelet            Liveness probe failed: Get "http://192.168.161.147:8080/health": dial tcp 192.168.161.147:8080: connect: connection refused
  Warning  BackOff          10s (x44 over 14m)   kubelet            Back-off restarting failed container

Any thoughts on why the pods keep restarting?
Appreciate your help here. 
Thanks

  [1]: https://i.stack.imgur.com/IPocP.png
j0pj023g

j0pj023g1#

您的主机端口是0。我猜这可能会导致Web服务器无法暴露其端口。但是,您必须检查Web服务器pod本身的日志,以确保这是问题所在。
您需要确保此终结点可用(当前不可用); http://192.168.161.147:8080/health

gz5pxeao

gz5pxeao2#

最终增加了Web服务器的资源,这解决了问题。
谢谢

相关问题