kubernetes 在sig-network测试中,LinuxOnly至少被严重滥用,

9rbhqvlz  于 10个月前  发布在  Kubernetes
关注(0)|答案(8)|浏览(86)

test/e2e/network 中,目前有45个标记为 [LinuxOnly] 的测试,其中似乎有37个不正确或至少可疑:

看似正确的测试:

  • 5个使用SCTP,据AIUI,Windows内核没有实现并且不打算实现。
  • 3个测试部分DNS名称( kubernetes.default.svc )显然不受Windows解析器的支持
8hhllhi2

8hhllhi21#

/triage accepted

a8jjtwal

a8jjtwal2#

你好,
确实,有很多测试被标记为 [LinuxOnly] ,从那时起,有一些测试被修复并/或随着时间的推移移除了这个标签,不一定都是与网络相关的[1][2][3][4][5][6](也可能有其他的)。但是在你提到的这45个测试中,并不是所有的都被标记为 [Conformance] ,或者至少就我所知,我只看到了8个:

  1. ubuntu@ubuntu:~/workdir/kubernetes$ yq '.[] | .codename | select(contains("LinuxOnly")) | select(contains("network"))' test/conformance/testdata/conformance.yaml
  2. [sig-network] DNS should resolve DNS of partial qualified names for services [LinuxOnly] [Conformance]
  3. [sig-network] HostPort validates that there is no conflict between pods with same hostPort but different hostIP and protocol [LinuxOnly] [Conformance]
  4. [sig-network] Networking Granular Checks: Pods should function for node-pod communication: http [LinuxOnly] [NodeConformance] [Conformance]
  5. [sig-network] Networking Granular Checks: Pods should function for node-pod communication: udp [LinuxOnly] [NodeConformance] [Conformance]
  6. [sig-network] Services should be able to switch session affinity for NodePort service [LinuxOnly] [Conformance]
  7. [sig-network] Services should be able to switch session affinity for service with type clusterIP [LinuxOnly] [Conformance]
  8. [sig-network] Services should have session affinity work for NodePort service [LinuxOnly] [Conformance]
  9. [sig-network] Services should have session affinity work for service with type clusterIP [LinuxOnly] [Conformance]

值得一提的是,可能还有更多的测试在Windows上通过类似 SkipIfNodeOSDistroIs("windows") 的方式被跳过,我不确定它们是否包含在你的数量中。我记得我们不允许给 [Conformance] 测试添加跳过,这就是为什么一开始需要这个标签的原因。这个标签通常不会被添加到不符合规范的测试中,但我认为将其作为标签是有用的;它使得更容易用grep查找哪些测试是专为Linux设计的还是不是。至于仅适用于Linux的网络测试的数量,一个更完整的测试列表可能会很有用,这样我们就可以检查它们是否被错误标记了,或者具体我们在谈论哪些测试。
另外请注意,非符合规范的测试通常不会像符合规范的那样进行测试,所以我不能对它们是否通过说太多。过去有一些努力推动更多网络测试进入符合规范的范围[7],但目前看来这种努力似乎已经停止了。
现在,关于你提到的这些测试:

  • SessionAffinity :可能需要更深入的调查。但从快速查看来看,它不起作用[8](将所有网络测试标记为 [Conformance] ,这样它们就会由Windows CI运行):
  1. Kubernetes e2e suite: [It] [sig-network] Services should have session affinity work for NodePort service [Conformance] 2m47s
  2. { failed [FAILED] Affinity should hold but didn't.
  3. In [It] at: k8s.io/kubernetes/test/e2e/network/service.go:266 @ 04/22/24 19:31:21.999
  4. }
  5. Kubernetes e2e suite: [It] [sig-network] Networking Granular Checks: Services should function for client IP based session affinity: udp [Conformance] 1m6s
  6. { failed [FAILED] Unexpected endpoints return: map[netserver-0:{} netserver-1:{}], expect 1 endpoints
  7. In [It] at: k8s.io/kubernetes/test/e2e/network/networking.go:447 @ 04/22/24 19:34:02.49
  8. }
  • dual-stack:的确,根据James[9],这些测试可以在WS 2022作业中启用,但不能在WS 2019作业中启用。我假设应该有一个专门用于dual-stack WS 2022作业的配置标志。从我看到的,它们被标记为 [Feature:IPv6DualStack] ,所以我们可以用这个标签将它们排除在外。
  • agnhost中的UDP支持:不确定哪些是这些测试。 should function for node-pod communication: udp ?
  • hostNetwork 测试:同意。不过,我认为有超过2个测试。
  • hostPorts 测试似乎在Windows [7]上失败了,可能需要进一步调查:
  1. Kubernetes e2e suite: [It] [sig-network] HostPort validates that there is no conflict between pods with same hostPort but different hostIP and protocol [Conformance] 45s
  2. { failed [FAILED] Failed to connect to exposed host ports
  3. In [It] at: k8s.io/kubernetes/test/e2e/network/hostport.go:161 @ 04/22/24 19:29:22.484
  4. }
  • IPv6测试:我猜你指的是 should provide Internet connection for containers ?它可能需要更多的调查。似乎有一个失败[10]: forward host lookup failed: h_errno 11001: HOST_NOT_FOUND
  • SCTP:的确,看起来Windows没有它[11]。但似乎有一些第三方实现。WDYT,我们是否应该考虑其中一个并说它是Kubernetes for Windows上的官方支持?
  • 部分DNS名称:是的,这是正确的。你可以使用FQDN或仅使用主机名部分。

[1] #101063
[2] #72729
[3] #97045
[4] #85453
[5] #78731
[6] #75591
[7] #73425
[8] https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/124447/pull-kubernetes-e2e-capz-windows-master/1782468233807269888
[9] #100870 (comment)
[10] https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/124447/pull-kubernetes-e2e-capz-windows-master/1782725964745150464
[11] https://learn.microsoft.com/en-us/answers/questions/778329/sctp-driver

展开查看全部
50pmv0ei

50pmv0ei3#

在您提到的45个测试中,并非所有测试都被标记为 [Conformance]
确实如此,但它们都涉及到用户可能想要使用的已记录功能。正如您所说,我们最终希望将更多测试转向符合性,因此Windows确实应该努力通过所有不是真正针对Linux的功能。
另外值得一提的是,通过类似 SkipIfNodeOSDistroIs("windows") 的方式,可能会有更多的测试在Windows上被跳过,我不确定它们是否包含在您的计数中。
不...我只查看了 [LinuxOnly]

  • agnhost中的UDP支持:不确定这些是哪些测试。 should function for node-pod communication: udp ?

不,抱歉,我应该列出这些。 test/e2e/network/netpol/network_policy.go 中的 NetworkPolicy between server and client using UDP 测试(实际上,它们都是 [LinuxOnly]SkipIfNodeOSDistroIs("windows") !)

  • hostNetwork 测试:同意。不过,我认为有超过2个测试。

也许其他测试是跳过而不是标记的。

  • IPv6测试:我猜您指的是 should provide Internet connection for containers ?

是的

  • SCTP:的确,似乎Windows没有它[11]。但似乎有一些第三方实现。WDYT,我们是否应该考虑其中之一,并说它是Kubernetes for Windows上官方支持的?

SCTP甚至没有被大多数Linux网络插件实现,而且看起来sig-windows已经有很多事情要做了,所以说它不在Windows上受支持似乎是合理的。如果有人想让它成为官方支持的话,他们可以去做这项工作...

展开查看全部
dzjeubhm

dzjeubhm4#

/assign @sebsoto

2exbekwf

2exbekwf5#

在几个任务中运行了 [LinuxOnly] 测试[1][2]。以下是一些结果:

  1. I0513 21:12:00.371224 129567 exec_util.go:83] ExecWithOptions: execute(POST https://capz-conf-hk7naa-c91daf83.canadacentral.cloudapp.azure.com:6443/api/v1/namespaces/hostport-7358/pods/e2e-host-exec/exec?command=%2Fbin%2Fsh&command=-c&command=curl+-g+--connect-timeout+5+--interface+10.1.0.5+http%3A%2F%2F127.0.0.1%3A54323%2Fhostname&container=e2e-host-exec&container=e2e-host-exec&stderr=true&stdout=true)
  2. I0513 21:12:00.689775 129567 hostport.go:129] Can not connect from e2e-host-exec to pod(pod1) to serverIP: 127.0.0.1, port: 54323
  • 网络策略测试( should ensure an IP overlapping both IPBlock.CIDR and IPBlock.Except is allowed [Feature:NetworkPolicy] , should allow egress access on one named port [Feature:NetworkPolicy] , should allow ingress access on one named port [Feature:NetworkPolicy] :
  1. I0426 11:41:47.163106 82743 probe.go:104] Expected allowed pod connection was instead BLOCKED --- run 'kubectl exec a -c cont-80-tcp -n netpol-y-4052 -- /agnhost connect 10.96.53.156:80 --timeout=3s --protocol=tcp'
  2. ...
  3. I0426 11:53:05.731060 82747 reachability.go:178] reachability: correct:48, incorrect:24, result=false
  • should function for client IP based session affinity: udp , should function for client IP based session affinity: http : Unexpected endpoints return: map[netserver-0:{} netserver-1:{}], expect 1 endpoints :
  1. I0426 11:10:47.846110 82747 utils.go:372] Tries: 10, in try: 0, stdout: {"responses":["netserver-0"]}, stderr: , command run in Pod { "name: test-container-pod, namespace: nettest-8008, hostIp: 10.1.0.4, podIp: 192.168.9.52, conditions: [{PodReadyToStartContainers True 0001-01-01 00:00:00 +0000 UTC 2024-04-26 11:10:44 +0000 UTC } {Initialized True 0001-01-01 00:00:00 +0000 UTC 2024-04-26 11:10:40 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2024-04-26 11:10:44 +0000 UTC } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2024-04-26 11:10:44 +0000 UTC } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2024-04-26 11:10:40 +0000 UTC }]" }
  2. I0426 11:11:02.246169 82747 utils.go:372] Tries: 10, in try: 6, stdout: {"responses":["netserver-1"]}, stderr: , command run in Pod { "name: test-container-pod, namespace: nettest-8008, hostIp: 10.1.0.4, podIp: 192.168.9.52, conditions: [{PodReadyToStartContainers True 0001-01-01 00:00:00 +0000 UTC 2024-04-26 11:10:44 +0000 UTC } {Initialized True 0001-01-01 00:00:00 +0000 UTC 2024-04-26 11:10:40 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2024-04-26 11:10:44 +0000 UTC } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2024-04-26 11:10:44 +0000 UTC } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2024-04-26 11:10:40 +0000 UTC }]" }
  • should have session affinity work for service with type clusterIP , should have session affinity timeout work for NodePort service , should be able to switch session affinity for service with type clusterIP : Affinity should hold but didn't.
  • should fail health check node port if there are only terminating endpoints :curl超时:
  1. I0426 12:31:54.822266 82743 service.go:2756] unexpected error trying to connect to nodeport 10.1.0.4:30904 : error running /usr/local/bin/kubectl --kubeconfig=/home/prow/go/src/k8s.io/windows-testing/capz/capz-conf-7w0a3a.kubeconfig --namespace=services-2600 exec pause-pod-0 -- /bin/sh -x -c curl -s -o /dev/null -w "%{http_code}" --max-time 5 http://10.1.0.4:30904/healthz:
  2. Command stdout:
  3. 000
  4. stderr:
  5. + curl -s -o /dev/null -w '%{http_code}' --max-time 5 http://10.1.0.4:30904/healthz
  6. command terminated with exit code 28
  • internalTrafficPolicyexternalTrafficPolicy 测试:curl退出代码7(无法连接到主机或代理):
  1. I0426 11:19:10.429603 82745 util.go:166] got err: error running /usr/local/bin/kubectl --kubeconfig=/home/prow/go/src/k8s.io/windows-testing/capz/capz-conf-7w0a3a.kubeconfig --namespace=services-7252 exec pause-pod-0 -- /bin/sh -x -c curl -q -s --max-time 30 10.107.35.46:80/hostname:
  2. Command stdout:
  3. stderr:
  4. + curl -q -s --max-time 30 10.107.35.46:80/hostname
  5. command terminated with exit code 7
  6. error:
  7. exit status 7, retry until timeout

[1] https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/124447/pull-kubernetes-e2e-capz-windows-master/1783804706359873536
[2] https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/124447/pull-kubernetes-e2e-capz-windows-master/1790109519284539392

展开查看全部
vcirk6k6

vcirk6k66#

另外值得一提的是,在Windows上可能有更多的测试被跳过,例如SkipIfNodeOSDistroIs("windows"),我不确定它们是否包含在你的计数中。
不,我只查看了[LinuxOnly]
顺便说一下,SkipIfNodeOSDistroIs("windows")的测试并不多,大多数与存储相关(需要RunAsUser或fs组),或者与sysctl相关。这些测试中的大多数也用[LinuxOnly]标记。

6yoyoihd

6yoyoihd7#

/cc @sbangari

cu6pst1q

cu6pst1q8#

  • 网络策略测试(should ensure an IP overlapping both IPBlock.CIDR and IPBlock.Except is allowed [Feature:NetworkPolicy]):

这个测试是专门添加的,因为多个实现方式得到了错误的行为,所以这可能指向了你的NetworkPolicy实现中的一个bug。

  • should allow egress access on one named port [Feature:NetworkPolicy], should allow ingress access on one named port [Feature:NetworkPolicy]:

命名端口是一个稍微模糊的功能,许多人在他们的初始NetworkPolicy实现中跳过了它(然后有时永远不会再回来)。

  • should fail health check node port if there are only terminating endpoints: curl超时:

这里实际测试的行为完全在kube-proxy的平台无关部分,所以这可能是e2e测试中的一个bug/Linux特性。(可能是关于pod到节点连接的一个错误假设?我总是忘记哪些是可以的,哪些是不可以的。)

  • internalTrafficPolicyexternalTrafficPolicy测试:curl退出代码7(无法连接到主机或代理):

可能是同样的问题,但是,winkernel有自己确定要使用哪些端点的方法,而不是使用proxy.CategorizeEndpoints,所以它可能会破坏一些边缘情况。

相关问题