最新版本的hudi(0.7.0,0.6.0)在读取orc文件时是否与spark2.3.0兼容?

lmyy7pcs  于 2021-07-13  发布在  Spark
关注(0)|答案(1)|浏览(486)

文档中说:hudi使用spark-2.x和spark 3.x版本(https://hudi.apache.org/docs/quick-start-guide.html)但是我还没有能够将hudi-spark-bundle琰u2.11版本0.7.0与spark 2.3.0和scala 2.11.12一起使用。是否有任何特定的Spark阿芙罗包一个必须使用?
作业失败,出现以下错误:java.lang.nosuchmethoderror:org.apache.spark.sql.types.decimal$.minbytesforprecision()[i任何输入都非常有用。
在我工作的集群中,我们有spark 2.3.0,没有立即升级的计划。想看看有没有办法让hudi0.7.0和spark2.3.0配合使用?
注意:我可以使用spark2.3.0和hudi-spark-bundle-0.5.0-incubating.jar
在spark shell中,我得到以下错误:

  1. scala> transformedDF.write.format("org.apache.hudi").
  2. | options(getQuickstartWriteConfigs).
  3. | option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "col1").
  4. | //option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "col2").
  5. | option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "col3,col4,col5").
  6. | option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
  7. | option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, "org.apache.hudi.ComplexKeyGenerator").
  8. | option("hoodie.upsert.shuffle.parallelism","20").
  9. | option("hoodie.insert.shuffle.parallelism","20").
  10. | option(HoodieCompactionConfig.PARQUET_SMALL_FILE_LIMIT_BYTES, 128 * 1024 * 1024).
  11. | option(HoodieStorageConfig.PARQUET_FILE_MAX_BYTES, 128 * 1024 * 1024).
  12. | option(HoodieWriteConfig.TABLE_NAME, "targetTableHudi").
  13. | mode(SaveMode.Append).
  14. | save(targetPath)
  15. 21/02/22 07:14:03 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
  16. java.lang.NoSuchMethodError: org.apache.spark.sql.types.Decimal$.minBytesForPrecision()[I
  17. at org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:156)
  18. at org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$$anonfun$5.apply(SchemaConverters.scala:176)
  19. at org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$$anonfun$5.apply(SchemaConverters.scala:174)
  20. at scala.collection.Iterator$class.foreach(Iterator.scala:893)
  21. at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
  22. at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
  23. at org.apache.spark.sql.types.StructType.foreach(StructType.scala:99)
  24. at org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:174)
  25. at org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:52)
  26. at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:139)
  27. at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
  28. at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
  29. at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  30. at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
  31. at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
  32. at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  33. at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  34. at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  35. at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  36. at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  37. at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  38. at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
  39. at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
  40. at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
  41. at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
  42. at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
  43. at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
  44. at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
  45. at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
  46. at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225)
  47. ... 62 elided
uplii1fm

uplii1fm1#

你能打开一个github问题吗(https://github.com/apache/hudi/issues)所以社区会及时回复你?

相关问题