pycharm中的spark avro错误:[pyspark.sql.utils.analysisexception:]

baubqpgj  于 2021-07-12  发布在  Spark
关注(0)|答案(0)|浏览(330)

我在做一个项目,在处理Kafka的avro数据时出错了。但它显示出错误。所以我想检查安装的spark是否能够读取avro格式。我使用下面显示的一个简单程序来检查spark是否读取我的avro文件[link]。但显示了一个错误。我已经两天没犯这些错误了。我非常感谢你的帮助

from pyspark.sql import SparkSession
    spark = SparkSession.builder.appName("indu").getOrCreate()
    df = spark.read.format('org.apache.spark.sql.avro').load("X:\Git_repo\project_red\Divolte\conf\MyEventRecord.avsc")
    df.printSchema()

我知道这是个错误

Ivy Default Cache set to: C:\Users\Sri\.ivy2\cache
The jars for the packages stored in: C:\Users\Sri\.ivy2\jars
:: loading settings :: url = jar:file:/C:/spark/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.spark#spark-sql-kafka-0-10_2.12 added as a dependency
org.apache.spark#spark-avro_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-f7ec3a79-4224-4faa-843c-6cae7c32af25;1.0
    confs: [default]
    found org.apache.spark#spark-sql-kafka-0-10_2.12;3.0.2 in central
    found org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.0.2 in central
    found org.apache.kafka#kafka-clients;2.4.1 in central
    found com.github.luben#zstd-jni;1.4.4-3 in central
    found org.lz4#lz4-java;1.7.1 in central
    found org.xerial.snappy#snappy-java;1.1.8.2 in central
    found org.slf4j#slf4j-api;1.7.30 in central
    found org.spark-project.spark#unused;1.0.0 in central
    found org.apache.commons#commons-pool2;2.6.2 in central
    found org.apache.spark#spark-avro_2.12;3.0.2 in central
:: resolution report :: resolve 848ms :: artifacts dl 33ms
    :: modules in use:
    com.github.luben#zstd-jni;1.4.4-3 from central in [default]
    org.apache.commons#commons-pool2;2.6.2 from central in [default]
    org.apache.kafka#kafka-clients;2.4.1 from central in [default]
    org.apache.spark#spark-avro_2.12;3.0.2 from central in [default]
    org.apache.spark#spark-sql-kafka-0-10_2.12;3.0.2 from central in [default]
    org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.0.2 from central in [default]
    org.lz4#lz4-java;1.7.1 from central in [default]
    org.slf4j#slf4j-api;1.7.30 from central in [default]
    org.spark-project.spark#unused;1.0.0 from central in [default]
    org.xerial.snappy#snappy-java;1.1.8.2 from central in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   10  |   0   |   0   |   0   ||   10  |   0   |
    ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-f7ec3a79-4224-4faa-843c-6cae7c32af25
    confs: [default]
    0 artifacts copied, 10 already retrieved (0kB/19ms)
21/03/03 08:07:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
  File "X:\Git_repo\project_red\spark_streaming\spark_scripting.py", line 47, in <module>
    df = spark.read.format('org.apache.spark.sql.avro').load("X:\Git_repo\project_red\Divolte\conf\MyEventRecord.avsc")
  File "C:\spark\spark\python\pyspark\sql\readwriter.py", line 178, in load
    return self._df(self._jreader.load(path))
  File "C:\spark\spark\python\lib\py4j-0.10.9-src.zip\py4j\java_gateway.py", line 1304, in __call__
  File "C:\spark\spark\python\pyspark\sql\utils.py", line 134, in deco
    raise_from(converted)
  File "<string>", line 3, in raise_from
pyspark.sql.utils.AnalysisException: Failed to find data source: org.apache.spark.sql.avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".;

avro文件的链接可以在这里找到
注意:
我的spark版本:3.0.2
scala版本:2.12.10
在spark-defaults.conf文件中,我添加了

spark.jars.packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.2,org.apache.spark:spark-avro_2.12:3.0.2

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题