kafka使用pyspark和ssl属性创建directstream

b09cbbtk  于 2021-06-05  发布在  Kafka
关注(0)|答案(0)|浏览(296)

我们有多个borkers和连接是安全的ssl协议。为了创建kafka direct stream,我试图传递ssl信息,如下所示,但是它的抛出错误,

kafkaParams = {"metadata.broker.list": "host1:port,host2:port,host3:port",
"security.protocol":"ssl",
"ssl.key.password":"***",
"ssl.keystore.location":"/path1/file.jks",
"ssl.keystore.password":"***",
"ssl.truststore.location":"/path1/file2.jks",
"ssl.truststore.password":"***"}

directKafkaStream = KafkaUtils.createDirectStream(ssc,["topic"],kafkaParams)

错误:

>>> directKafkaStream = KafkaUtils.createDirectStream(ssc,["topic"],kafkaParams)

**20/02/12 11:22:54 WARN utils.VerifiableProperties: Property security.protocol is not valid

20/02/12 11:22:54 WARN utils.VerifiableProperties: Property ssl.key.password is not valid
20/02/12 11:22:54 WARN utils.VerifiableProperties: Property ssl.keystore.location is not valid
20/02/12 11:22:54 WARN utils.VerifiableProperties: Property ssl.keystore.password is not valid
20/02/12 11:22:54 WARN utils.VerifiableProperties: Property ssl.truststore.location is not valid
20/02/12 11:22:54 WARN utils.VerifiableProperties: Property ssl.truststore.password is not valid**
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/pyspark/streaming/kafka.py", line 146, in createDirectStream
    ssc._jssc, kafkaParams, set(topics), jfromOffsets)
  File "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a,**kw)
  File "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o10805.createDirectStreamWithoutMessageHandler.
: org.apache.spark.SparkException: java.io.EOFException
java.io.EOFException
java.io.EOFException
        at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:387)
        at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:387)
        at scala.util.Either.fold(Either.scala:98)
        at org.apache.spark.streaming.kafka.KafkaCluster$.checkErrors(KafkaCluster.scala:386)
        at org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:223)
        at org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper.createDirectStream(KafkaUtils.scala:721)
        at org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper.createDirectStreamWithoutMessageHandler(KafkaUtils.scala:689)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)

另一方面,尝试通过如下方式传递ssl信息来readstream,这没有任何问题,因此不确定如何传递ssl信息,因为主要目标是拥有dstream

kafkaParams = "host1:port,host2:port,host3:port'"
topic = "topic"

df= spark.readStream.format("kafka")\
.option("kafka.bootstrap.servers",kafkaParams)\
.option("kafka.security.protocol", "SSL")\
.option("kafka.ssl.truststore.location", SparkFiles.get("file.jks")) \
.option("kafka.ssl.truststore.password", "***") \
.option("kafka.ssl.keystore.location", SparkFiles.get("file1.jks")) \
.option("kafka.ssl.keystore.password", "***") \
.option("subscribe",topic)\
.option("startingOffsets","earliest")\
.load()

df1 = df.selectExpr("CAST(value as STRING)","timestamp")
from pyspark.sql.types import StructType, StringType

df_schema = StructType()\
.add("cust_id",StringType())\
.add("name",StringType())\
.add("age",StringType())\
.add("address",StringType())

from pyspark.sql.functions import from_json,col

df2 = df1.select(from_json(col("value"),df_schema).alias("df_a"),"timestamp")

df_console_write = df2\
.writeStream\
.trigger(processingTime='10 seconds')\
.option("truncate","false")\
.format("console")\
.start()

df_console_write.awaitTermination()

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题