我们有多个borkers和连接是安全的ssl协议。为了创建kafka direct stream,我试图传递ssl信息,如下所示,但是它的抛出错误,
kafkaParams = {"metadata.broker.list": "host1:port,host2:port,host3:port",
"security.protocol":"ssl",
"ssl.key.password":"***",
"ssl.keystore.location":"/path1/file.jks",
"ssl.keystore.password":"***",
"ssl.truststore.location":"/path1/file2.jks",
"ssl.truststore.password":"***"}
directKafkaStream = KafkaUtils.createDirectStream(ssc,["topic"],kafkaParams)
错误:
>>> directKafkaStream = KafkaUtils.createDirectStream(ssc,["topic"],kafkaParams)
**20/02/12 11:22:54 WARN utils.VerifiableProperties: Property security.protocol is not valid
20/02/12 11:22:54 WARN utils.VerifiableProperties: Property ssl.key.password is not valid
20/02/12 11:22:54 WARN utils.VerifiableProperties: Property ssl.keystore.location is not valid
20/02/12 11:22:54 WARN utils.VerifiableProperties: Property ssl.keystore.password is not valid
20/02/12 11:22:54 WARN utils.VerifiableProperties: Property ssl.truststore.location is not valid
20/02/12 11:22:54 WARN utils.VerifiableProperties: Property ssl.truststore.password is not valid**
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/pyspark/streaming/kafka.py", line 146, in createDirectStream
ssc._jssc, kafkaParams, set(topics), jfromOffsets)
File "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/pyspark/sql/utils.py", line 63, in deco
return f(*a,**kw)
File "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o10805.createDirectStreamWithoutMessageHandler.
: org.apache.spark.SparkException: java.io.EOFException
java.io.EOFException
java.io.EOFException
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:387)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:387)
at scala.util.Either.fold(Either.scala:98)
at org.apache.spark.streaming.kafka.KafkaCluster$.checkErrors(KafkaCluster.scala:386)
at org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:223)
at org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper.createDirectStream(KafkaUtils.scala:721)
at org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper.createDirectStreamWithoutMessageHandler(KafkaUtils.scala:689)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
另一方面,尝试通过如下方式传递ssl信息来readstream,这没有任何问题,因此不确定如何传递ssl信息,因为主要目标是拥有dstream
kafkaParams = "host1:port,host2:port,host3:port'"
topic = "topic"
df= spark.readStream.format("kafka")\
.option("kafka.bootstrap.servers",kafkaParams)\
.option("kafka.security.protocol", "SSL")\
.option("kafka.ssl.truststore.location", SparkFiles.get("file.jks")) \
.option("kafka.ssl.truststore.password", "***") \
.option("kafka.ssl.keystore.location", SparkFiles.get("file1.jks")) \
.option("kafka.ssl.keystore.password", "***") \
.option("subscribe",topic)\
.option("startingOffsets","earliest")\
.load()
df1 = df.selectExpr("CAST(value as STRING)","timestamp")
from pyspark.sql.types import StructType, StringType
df_schema = StructType()\
.add("cust_id",StringType())\
.add("name",StringType())\
.add("age",StringType())\
.add("address",StringType())
from pyspark.sql.functions import from_json,col
df2 = df1.select(from_json(col("value"),df_schema).alias("df_a"),"timestamp")
df_console_write = df2\
.writeStream\
.trigger(processingTime='10 seconds')\
.option("truncate","false")\
.format("console")\
.start()
df_console_write.awaitTermination()
暂无答案!
目前还没有任何答案,快来回答吧!