我在Docker容器中启动了Spark示例。然后我尝试连接到它并读取一个Web Socket流。
下面是我使用的抛出异常的代码段:
def spark_ws():
spark = SparkSession.builder.remote("sc://localhost:7077").getOrCreate()
lines = spark.readStream.format("socket")\
.option("host", "<my_ws_stream>")\
.option("port", 443)\
.load()
请注意,我已经在我的诗歌环境中安装了**pyspark
**。
我在尝试访问pyspark.sql.session.SparkSession
对象的属性时遇到以下错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/debian/.cache/pypoetry/virtualenvs/scripts-CqDm9Ky7-py3.9/lib/python3.9/site-packages/pyspark/sql/connect/session.py", line 466, in newSession
raise NotImplementedError("readStream() is not implemented.")
NotImplementedError: readStream() is not implemented.
version: "3.9"
services:
spark-master:
image: bitnami/spark:3.3.3
container_name: spark-master
environment:
- SPARK_MODE=master
ports:
- 8080:8080
- 4040:4040
- 7077:7077
volumes:
- ./data:/data
- ./src:/src
spark-worker:
image: bitnami/spark:3.3.3
container_name: spark-worker
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark-master:7077
- SPARK_EXECUTOR_MEMORY=4G
- SPARK_WORKER_MEMORY=4G
- SPARK_WORKER_CORES=4
volumes:
- ./data:/data
- ./src:/src
1条答案
按热度按时间o75abkj41#
不久前我遇到了同样的问题,我通过检查是否安装了Java来修复它。我在本地安装了它,问题解决了!