kubernetes [不稳定测试]存储 GCE 不稳定的端到端测试多次失败

1cklez4t  于 9个月前  发布在  Kubernetes
关注(0)|答案(5)|浏览(128)

哪个任务出现了问题?

  • ci-kubernetes-e2e-gci-gce-flaky

哪个测试出现了问题?
3个不同的测试,与SIG Storage和SIG API Machinery相关
参见:
https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-flaky/1661222001538240512
https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-flaky/1660677945099816960

从何时开始出现问题?
05-09 23:58 CST

Testgrid链接
https://testgrid.k8s.io/sig-storage-kubernetes#gce-flaky

失败原因(如果可能)
存储:

  1. May 24 04:35:44.167: INFO: Failed inside E2E framework:
  2. k8s.io/kubernetes/test/e2e/framework/pod.WaitTimeoutForPodRunningInNamespace({0x7f28c44ff9b8, 0xc004034ba0}, {0x72d13d0?, 0xc0005604e0?}, {0xc002fb8cf0, 0x10}, {0xc0019cf410, 0x12}, 0x0?)
  3. test/e2e/framework/pod/wait.go:459 +0x1a4
  4. k8s.io/kubernetes/test/e2e/framework/pod.WaitForPodNameRunningInNamespace(...)
  5. test/e2e/framework/pod/wait.go:443
  6. k8s.io/kubernetes/test/e2e/framework/pod.CreatePod({0x7f28c44ff9b8, 0xc004034ba0}, {0x72d13d0?, 0xc0005604e0}, {0xc0019cf410, 0x12}, 0x0?, {0xc003a3fcf0, 0x2, 0x2}, ...)
  7. test/e2e/framework/pod/create.go:87 +0x1c5
  8. k8s.io/kubernetes/test/e2e/storage.glob..func19.2.1({0x7f28c44ff9b8, 0xc004034ba0})
  9. test/e2e/storage/nfs_persistent_volume-disruptive.go:181 +0x8a5
  10. [FAILED] pod "pvc-tester-2hw7x" is not Running: Timed out after 300.000s.
  11. Expected Pod to be in <v1.PodPhase>: "Running"
  12. Got instead:
  13. <*v1.Pod | 0xc003c07b00>:
  14. metadata:
  15. creationTimestamp: "2023-05-24T04:30:44Z"
  16. generateName: pvc-tester-
  17. managedFields:
  18. - apiVersion: v1
  19. fieldsType: FieldsV1
  20. fieldsV1:
  21. f:metadata:
  22. f:generateName: {}
  23. f:spec:
  24. f:containers:
  25. k:{"name":"write-pod"}:
  26. .: {}
  27. f:command: {}
  28. f:image: {}
  29. f:imagePullPolicy: {}
  30. f:name: {}
  31. f:resources: {}
  32. f:securityContext:
  33. .: {}
  34. f:privileged: {}
  35. f:terminationMessagePath: {}
  36. f:terminationMessagePolicy: {}
  37. f:volumeMounts:
  38. .: {}
  39. k:{"mountPath":"/mnt/volume1"}:
  40. .: {}
  41. f:mountPath: {}
  42. f:name: {}
  43. k:{"mountPath":"/mnt/volume2"}:
  44. .: {}
  45. f:mountPath: {}
  46. f:name: {}
  47. f:dnsPolicy: {}
  48. f:enableServiceLinks: {}
  49. f:restartPolicy: {}
  50. f:schedulerName: {}
  51. f:securityContext: {}
  52. f:terminationGracePeriodSeconds: {}
  53. f:volumes:
  54. .: {}
  55. k:{"name":"volume1"}:
  56. .: {}
  57. f:name: {}
  58. f:persistentVolumeClaim:
  59. .: {}
  60. f:claimName: {}
  61. k:{"name":"volume2"}:
  62. .: {}
  63. f:name: {}
  64. f:persistentVolumeClaim:
  65. .: {}
  66. f:claimName: {}
  67. manager: e2e.test
  68. operation: Update
  69. time: "2023-05-24T04:30:44Z"
  70. - apiVersion: v1
  71. fieldsType: FieldsV1
  72. fieldsV1:
  73. f:status:
  74. f:conditions:
  75. .: {}
  76. k:{"type":"PodScheduled"}:
  77. .: {}
  78. f:lastProbeTime: {}
  79. f:lastTransitionTime: {}
  80. f:message: {}
  81. f:reason: {}
  82. f:status: {}
  83. f:type: {}
  84. manager: kube-scheduler
  85. operation: Update
  86. subresource: status
  87. time: "2023-05-24T04:30:44Z"
  88. name: pvc-tester-2hw7x
  89. namespace: disruptive-pv-5536
  90. resourceVersion: "3111"
  91. uid: 866f2c7f-b379-46ba-8af1-77c49895e96c
  92. spec:
  93. containers:
  94. - command:
  95. - /bin/sh
  96. - -c
  97. - trap exit TERM; while true; do sleep 1; done
  98. image: registry.k8s.io/e2e-test-images/busybox:1.29-4
  99. imagePullPolicy: IfNotPresent
  100. name: write-pod
  101. resources: {}
  102. securityContext:
  103. privileged: true
  104. terminationMessagePath: /dev/termination-log
  105. terminationMessagePolicy: File
  106. volumeMounts:
  107. - mountPath: /mnt/volume1
  108. name: volume1
  109. - mountPath: /mnt/volume2
  110. name: volume2
  111. - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
  112. name: kube-api-access-4tsnz
  113. readOnly: true
  114. dnsPolicy: ClusterFirst
  115. enableServiceLinks: true
  116. preemptionPolicy: PreemptLowerPriority
  117. priority: 0
  118. restartPolicy: OnFailure
  119. schedulerName: default-scheduler
  120. securityContext: {}
  121. serviceAccount: default
  122. serviceAccountName: default
  123. terminationGracePeriodSeconds: 30
  124. tolerations:
  125. - effect: NoExecute
  126. key: node.kubernetes.io/not-ready
  127. operator: Exists
  128. tolerationSeconds: 300
  129. - effect: NoExecute
  130. key: node.kubernetes.io/unreachable
  131. operator: Exists
  132. tolerationSeconds: 300
  133. volumes:
  134. - name: volume1
  135. persistentVolumeClaim:
  136. claimName: pvc-prm4b
  137. - name: volume2
  138. persistentVolumeClaim:
  139. claimName: pvc-z8dxl
  140. - name: kube-api-access-4tsnz
  141. projected:
  142. defaultMode: 420
  143. sources:
  144. - serviceAccountToken:
  145. expirationSeconds: 3607
  146. path: token
  147. - configMap:
  148. items:
  149. - key: ca.crt
  150. path: ca.crt
  151. name: kube-root-ca.crt
  152. - downwardAPI:
  153. items:
  154. - fieldRef:
  155. apiVersion: v1
  156. fieldPath: metadata.namespace
  157. path: namespace
  158. status:
  159. conditions:
  160. - lastProbeTime: null
  161. lastTransitionTime: "2023-05-24T04:30:44Z"
  162. message: '0/4 nodes are available: 1 node(s) were unschedulable, 3 node(s) had
  163. volume node affinity conflict. preemption: 0/4 nodes are available: 4 Preemption
  164. is not helpful for scheduling..'
  165. reason: Unschedulable
  166. status: "False"
  167. type: PodScheduled
  168. phase: Pending
  169. qosClass: BestEffort
  170. In [BeforeEach] at: test/e2e/storage/nfs_persistent_volume-disruptive.go:182 @ 05/24/23 04:35:44.167

API机械

  1. [FAILED] failed to explain ksvc-1684903113.spec: error running /workspace/kubernetes/platforms/linux/amd64/kubectl --server=https://34.145.75.255 --kubeconfig=/workspace/.kube/config --namespace=crd-publish-openapi-5175 explain ksvc-1684903113.spec:
  2. Command stdout:
  3. stderr:
  4. the server doesn't have a resource type "ksvc-1684903113"
  5. error:
  6. exit status 1
  7. In [It] at: test/e2e/apimachinery/crd_publish_openapi.go:501 @ 05/24/23 04:38:39.602

我们需要了解其他信息吗?

  • 无响应*

相关的SIG(s)
/sig storage
/sig api-machinery

mqxuamgl

mqxuamgl1#

/assign

b5lpy0ml

b5lpy0ml2#

/triage accepted

wfsdck30

wfsdck303#

/cc

o75abkj4

o75abkj44#

这个问题已经超过一年没有更新了,应该重新进行优先级评估。
你可以:

  • 确认这个问题仍然与 /triage accepted (仅组织成员)相关
  • /close 关闭这个问题

有关优先级评估过程的更多详细信息,请参见 https://www.kubernetes.dev/docs/guide/issue-triage/
已接受移除优先级评估

xkftehaa

xkftehaa5#

/remove-sig api-machinery

相关问题