我的pom.xml
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.4</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.specs</groupId>
<artifactId>specs</artifactId>
<version>1.2.5</version>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.0.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.0.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.databricks/spark-xml -->
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-xml_2.12</artifactId>
<version>0.10.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/mysql/mysql-connector-java -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.29</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.11.985</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-s3 -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<version>1.11.985</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-core -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-core</artifactId>
<version>1.11.985</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-dynamodb -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-dynamodb</artifactId>
<version>1.11.985</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-cloudwatch -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-cloudwatch</artifactId>
<version>1.11.985</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-kinesis -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-kinesis</artifactId>
<version>1.11.985</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-avro -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-avro_2.12</artifactId>
<version>3.1.1</version>
</dependency>
我想读avro文件
val conf = new SparkConf().setAppName("Nightmare").setMaster("local")
val sc = new SparkContext(conf)
sc.setLogLevel("Error")
val spark= SparkSession.builder().getOrCreate()
import spark.implicits._
//Step 1-2 read avro file
println("Step 1-2")
val df1 = spark.read
.format("com.databricks.spark.avro")
.option("multiline","true")
.load("file:///D:/bigdata_tasks/nightmare.avro")
//step 3-4 Hit the url --- Convert it to dataframe -- https://randomuser.me/api/0.8/?results=1000 - df2
println("Step 3-5")
val html =Source.fromURL(" https://randomuser.me/api/0.8/?results=1000")
val rdddata=html.mkString
//convert string to rdd
val paralleldata=sc.parallelize(List(rdddata))
val df2= spark.read.json(paralleldata)
df2.printSchema()
df2.show()
运行后出现异常:
线程“main”java.lang.noclassdeffounderror中出现异常:org/apache/spark/sql/catalyst/structfilters
我还尝试了以下代码:
val df1 = spark.read
.format("avro")
.option("multiline","true")
.load("file:///D:/bigdata_tasks/nightmare.avro")
但同样的例外是给予。我的spark版本是2.12。我应该更新spark版本吗?
1条答案
按热度按时间flvtvl501#
最可能的原因是混合了spark版本-您的avro库来自spark 3.1.1,而spark的核心来自spark 3.0.1(通常最好将版本声明为属性,这样所有组件都可以有一个版本)。另外,删除不必要的依赖项,如spark xml、aws sdk等。
还要检查scala版本