我在做一个项目,在处理Kafka的avro数据时出错了。但它显示出错误。所以我想检查安装的spark是否能够读取avro格式。我使用下面显示的一个简单程序来检查spark是否读取我的avro文件[link]。但显示了一个错误。我已经两天没犯这些错误了。我非常感谢你的帮助
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("indu").getOrCreate()
df = spark.read.format('org.apache.spark.sql.avro').load("X:\Git_repo\project_red\Divolte\conf\MyEventRecord.avsc")
df.printSchema()
我知道这是个错误
Ivy Default Cache set to: C:\Users\Sri\.ivy2\cache
The jars for the packages stored in: C:\Users\Sri\.ivy2\jars
:: loading settings :: url = jar:file:/C:/spark/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.spark#spark-sql-kafka-0-10_2.12 added as a dependency
org.apache.spark#spark-avro_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-f7ec3a79-4224-4faa-843c-6cae7c32af25;1.0
confs: [default]
found org.apache.spark#spark-sql-kafka-0-10_2.12;3.0.2 in central
found org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.0.2 in central
found org.apache.kafka#kafka-clients;2.4.1 in central
found com.github.luben#zstd-jni;1.4.4-3 in central
found org.lz4#lz4-java;1.7.1 in central
found org.xerial.snappy#snappy-java;1.1.8.2 in central
found org.slf4j#slf4j-api;1.7.30 in central
found org.spark-project.spark#unused;1.0.0 in central
found org.apache.commons#commons-pool2;2.6.2 in central
found org.apache.spark#spark-avro_2.12;3.0.2 in central
:: resolution report :: resolve 848ms :: artifacts dl 33ms
:: modules in use:
com.github.luben#zstd-jni;1.4.4-3 from central in [default]
org.apache.commons#commons-pool2;2.6.2 from central in [default]
org.apache.kafka#kafka-clients;2.4.1 from central in [default]
org.apache.spark#spark-avro_2.12;3.0.2 from central in [default]
org.apache.spark#spark-sql-kafka-0-10_2.12;3.0.2 from central in [default]
org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.0.2 from central in [default]
org.lz4#lz4-java;1.7.1 from central in [default]
org.slf4j#slf4j-api;1.7.30 from central in [default]
org.spark-project.spark#unused;1.0.0 from central in [default]
org.xerial.snappy#snappy-java;1.1.8.2 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 10 | 0 | 0 | 0 || 10 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-f7ec3a79-4224-4faa-843c-6cae7c32af25
confs: [default]
0 artifacts copied, 10 already retrieved (0kB/19ms)
21/03/03 08:07:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
File "X:\Git_repo\project_red\spark_streaming\spark_scripting.py", line 47, in <module>
df = spark.read.format('org.apache.spark.sql.avro').load("X:\Git_repo\project_red\Divolte\conf\MyEventRecord.avsc")
File "C:\spark\spark\python\pyspark\sql\readwriter.py", line 178, in load
return self._df(self._jreader.load(path))
File "C:\spark\spark\python\lib\py4j-0.10.9-src.zip\py4j\java_gateway.py", line 1304, in __call__
File "C:\spark\spark\python\pyspark\sql\utils.py", line 134, in deco
raise_from(converted)
File "<string>", line 3, in raise_from
pyspark.sql.utils.AnalysisException: Failed to find data source: org.apache.spark.sql.avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".;
avro文件的链接可以在这里找到
注意:
我的spark版本:3.0.2
scala版本:2.12.10
在spark-defaults.conf文件中,我添加了
spark.jars.packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.2,org.apache.spark:spark-avro_2.12:3.0.2
暂无答案!
目前还没有任何答案,快来回答吧!