pyspark `readStream`未实现错误

fykwrbwg  于 2023-10-15  发布在  Spark
关注(0)|答案(1)|浏览(112)

我在Docker容器中启动了Spark示例。然后我尝试连接到它并读取一个Web Socket流。
下面是我使用的抛出异常的代码段:

def spark_ws():
    spark = SparkSession.builder.remote("sc://localhost:7077").getOrCreate()
    lines = spark.readStream.format("socket")\
        .option("host", "<my_ws_stream>")\
        .option("port", 443)\
        .load()

请注意,我已经在我的诗歌环境中安装了**pyspark**。
我在尝试访问pyspark.sql.session.SparkSession对象的属性时遇到以下错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/debian/.cache/pypoetry/virtualenvs/scripts-CqDm9Ky7-py3.9/lib/python3.9/site-packages/pyspark/sql/connect/session.py", line 466, in newSession
    raise NotImplementedError("readStream() is not implemented.")
NotImplementedError: readStream() is not implemented.
version: "3.9"

services:

  spark-master:
    image: bitnami/spark:3.3.3
    container_name: spark-master
    environment:
      - SPARK_MODE=master
    ports:
      - 8080:8080
      - 4040:4040
      - 7077:7077
    volumes:
      - ./data:/data
      - ./src:/src

  spark-worker:
    image: bitnami/spark:3.3.3
    container_name: spark-worker
    environment:
      - SPARK_MODE=worker
      - SPARK_MASTER_URL=spark://spark-master:7077
      - SPARK_EXECUTOR_MEMORY=4G
      - SPARK_WORKER_MEMORY=4G
      - SPARK_WORKER_CORES=4
    volumes:
      - ./data:/data
      - ./src:/src
o75abkj4

o75abkj41#

不久前我遇到了同样的问题,我通过检查是否安装了Java来修复它。我在本地安装了它,问题解决了!

相关问题