如何连接kubernetes中的两个pod,因为它们在同一个本地网络中,所有端口都打开了

dfddblmv  于 2021-07-13  发布在  Spark
关注(0)|答案(1)|浏览(427)

tldr公司;

有没有可能在kubernetes连接两个吊舱,因为它们在同一个本地网络中,所有端口都打开了?

动机

目前,我们已经在kubernetes集群中实现了airflow,为了使用tensorflow扩展,我们需要使用apachebeam。对于我们的用例,spark是要使用的合适的runner,由于airflow和tensorflow是用python编码的,所以我们需要使用apachebeam的便携式runner(https://beam.apache.org/documentation/runners/spark/#portability).

问题

airflow pod和作业服务器pod之间的通信导致传输错误(可能是因为作业服务器使用了一些随机端口)。

设置

为了遵循良好的隔离实践并模仿kubernetes公共设置中的spark(在pod中使用集群内的驱动程序),作业服务器实现为:

  1. apiVersion: apps/v1
  2. kind: Deployment
  3. metadata:
  4. name: beam-spark-job-server
  5. labels:
  6. app: airflow-k8s
  7. spec:
  8. selector:
  9. matchLabels:
  10. app: beam-spark-job-server
  11. replicas: 1
  12. template:
  13. metadata:
  14. labels:
  15. app: beam-spark-job-server
  16. spec:
  17. restartPolicy: Always
  18. containers:
  19. - name: beam-spark-job-server
  20. image: apache/beam_spark_job_server:2.27.0
  21. args: ["--spark-master-url=spark://spark-master:7077"]
  22. resources:
  23. limits:
  24. memory: "1Gi"
  25. cpu: "0.7"
  26. env:
  27. - name: SPARK_PUBLIC_DNS
  28. value: spark-client
  29. ports:
  30. - containerPort: 8099
  31. protocol: TCP
  32. name: job-server
  33. - containerPort: 7077
  34. protocol: TCP
  35. name: spark-master
  36. - containerPort: 8098
  37. protocol: TCP
  38. name: artifact
  39. - containerPort: 8097
  40. protocol: TCP
  41. name: java-expansion
  42. apiVersion: v1
  43. kind: Service
  44. metadata:
  45. name: beam-spark-job-server
  46. labels:
  47. app: airflow-k8s
  48. spec:
  49. type: ClusterIP
  50. selector:
  51. app: beam-spark-job-server
  52. ports:
  53. - port: 8099
  54. protocol: TCP
  55. targetPort: 8099
  56. name: job-server
  57. - port: 7077
  58. protocol: TCP
  59. targetPort: 7077
  60. name: spark-master
  61. - port: 8098
  62. protocol: TCP
  63. targetPort: 8098
  64. name: artifact
  65. - port: 8097
  66. protocol: TCP
  67. targetPort: 8097
  68. name: java-expansion

开发/错误

如果我执行命令 python -m apache_beam.examples.wordcount --output ./data_test/ --runner=PortableRunner --job_endpoint=beam-spark-job-server:8099 --environment_type=LOOPBACK 在airflow pod中,作业服务器上没有日志,终端上出现以下错误:

  1. INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.
  2. INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
  3. INFO:oauth2client.client:Timeout attempting to reach GCE metadata service.
  4. WARNING:apache_beam.internal.gcp.auth:Unable to find default credentials to use: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
  5. Connecting anonymously.
  6. INFO:apache_beam.runners.worker.worker_pool_main:Listening for workers at localhost:46569
  7. WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
  8. INFO:root:Default Python SDK image for environment is apache/beam_python3.7_sdk:2.27.0
  9. Traceback (most recent call last):
  10. File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
  11. "__main__", mod_spec)
  12. File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
  13. exec(code, run_globals)
  14. File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/examples/wordcount.py", line 99, in <module>
  15. run()
  16. File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/examples/wordcount.py", line 94, in run
  17. ERROR:grpc._channel:Exception iterating requests!
  18. Traceback (most recent call last):
  19. File "/home/airflow/.local/lib/python3.7/site-packages/grpc/_channel.py", line 195, in consume_request_iterator
  20. request = next(request_iterator)
  21. File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/runners/portability/artifact_service.py", line 355, in __next__
  22. raise self._queue.get()
  23. File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/pipeline.py", line 561, in run
  24. return self.runner.run_pipeline(self, self._options)
  25. File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/runners/portability/portable_runner.py", line 421, in run_pipeline
  26. job_service_handle.submit(proto_pipeline)
  27. File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/runners/portability/portable_runner.py", line 115, in submit
  28. prepare_response.staging_session_token)
  29. File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/runners/portability/portable_runner.py", line 214, in stage
  30. staging_session_token)
  31. File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/runners/portability/artifact_service.py", line 241, in offer_artifacts
  32. for request in requests:
  33. File "/home/airflow/.local/lib/python3.7/site-packages/grpc/_channel.py", line 416, in __next__
  34. return self._next()
  35. File "/home/airflow/.local/lib/python3.7/site-packages/grpc/_channel.py", line 803, in _next
  36. raise self
  37. grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
  38. status = StatusCode.INVALID_ARGUMENT
  39. details = "Unknown staging token job_b6f49cc2-6732-4ea3-9aef-774e3d22867b"
  40. debug_error_string = "{"created":"@1613765341.075846957","description":"Error received from peer ipv4:127.0.0.1:8098","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Unknown staging token job_b6f49cc2-6732-4ea3-9aef-774e3d22867b","grpc_status":3}"
  41. >
  42. output | 'Write' >> WriteToText(known_args.output)
  43. File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/pipeline.py", line 582, in __exit__
  44. self.result = self.run()
  45. File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/pipeline.py", line 561, in run
  46. return self.runner.run_pipeline(self, self._options)
  47. File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/runners/portability/portable_runner.py", line 421, in run_pipeline
  48. job_service_handle.submit(proto_pipeline)
  49. File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/runners/portability/portable_runner.py", line 115, in submit
  50. prepare_response.staging_session_token)
  51. File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/runners/portability/portable_runner.py", line 214, in stage
  52. staging_session_token)
  53. File "/home/airflow/.local/lib/python3.7/site-packages/apache_beam/runners/portability/artifact_service.py", line 241, in offer_artifacts
  54. for request in requests:
  55. File "/home/airflow/.local/lib/python3.7/site-packages/grpc/_channel.py", line 416, in __next__
  56. return self._next()
  57. File "/home/airflow/.local/lib/python3.7/site-packages/grpc/_channel.py", line 803, in _next
  58. raise self
  59. grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
  60. status = StatusCode.INVALID_ARGUMENT
  61. details = "Unknown staging token job_b6f49cc2-6732-4ea3-9aef-774e3d22867b"
  62. debug_error_string = "{"created":"@1613765341.075846957","description":"Error received from peer ipv4:127.0.0.1:8098","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Unknown staging token job_b6f49cc2-6732-4ea3-9aef-774e3d22867b","grpc_status":3}"

表示传输作业时出错。如果我在同一个pod中实现作业服务器,那么我将在这两个容器之间获得完整的工作通信,我希望它们在不同的pod中具有相同的行为。

ukdjmx9f

ukdjmx9f1#

你需要在一个舱里部署两个集装箱

相关问题