java—无法通过ApacheIceberg将数据写入表

mm5n2pyu  于 2021-07-11  发布在  Java
关注(0)|答案(1)|浏览(509)

我试图用ApacheIceberg0.9.1将简单的数据写入表中,但是错误消息显示。我想用hadoop直接积垢数据。我创建了一个hadoop表,并尝试从表中读取数据。之后,我尝试将数据写入表中。我准备了一个包含一行的json文件。我的代码已经读取了json对象,并排列了数据的顺序,但是最后一步写入数据总是错误的。我已经更改了依赖项包的一些版本,但是显示了另一个错误消息。软件包的版本有问题吗。请帮帮我。
这是我的源代码:

  1. public class IcebergTest {
  2. public static void main(String[] args) {
  3. testWithoutCatalog();
  4. readDataWithouCatalog();
  5. writeDataWithoutCatalog();
  6. }
  7. public static void testWithoutCatalog() {
  8. Schema bookSchema = new Schema(optional(1, "title", Types.StringType.get()),
  9. optional(2, "price", Types.LongType.get()),
  10. optional(3, "author", Types.StringType.get()),
  11. optional(4, "genre", Types.StringType.get()));
  12. PartitionSpec bookspec = PartitionSpec.builderFor(bookSchema).identity("title").build();
  13. Configuration conf = new Configuration();
  14. String warehousePath = "hdfs://hadoop01:9000/warehouse_path/xgfying/books3";
  15. HadoopTables tables = new HadoopTables(conf);
  16. Table table = tables.create(bookSchema, bookspec, warehousePath);
  17. }
  18. public static void readDataWithouCatalog(){
  19. .......
  20. }
  21. public static void writeDataWithoutCatalog(){
  22. SparkSession spark = SparkSession.builder().master("local[2]").getOrCreate();
  23. Dataset<Row> df = spark.read().json("src/test/data/books3.json");
  24. System.out.println(" this is the writing data : "+df.select("title","price","author","genre")
  25. .first().toString());
  26. df.select("title","price","author","genre")
  27. .write().format("iceberg").mode("append")
  28. .save("hdfs://hadoop01:9000/warehouse_path/xgfying/books3");
  29. // System.out.println(df.write().format("iceberg").mode("append").toString());
  30. }
  31. }

以下是错误消息:

  1. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
  2. 20/11/18 15:51:36 INFO SparkContext: Running Spark version 2.4.5
  3. .......
  4. file:///C:/tmp/icebergtest1/src/test/data/books3.json, range: 0-75, partition values: [empty row]
  5. 20/11/18 15:51:52 ERROR Utils: Aborting task
  6. java.lang.ExceptionInInitializerError
  7. at org.apache.iceberg.parquet.Parquet$WriteBuilder.build(Parquet.java:232)
  8. at org.apache.iceberg.spark.source.SparkAppenderFactory.newAppender(SparkAppenderFactory.java:61)
  9. at org.apache.iceberg.spark.source.BaseWriter.openCurrent(BaseWriter.java:105)
  10. at org.apache.iceberg.spark.source.PartitionedWriter.write(PartitionedWriter.java:63)
  11. at org.apache.iceberg.spark.source.Writer$Partitioned24Writer.write(Writer.java:271)
  12. at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:118)
  13. at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:116)
  14. at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
  15. at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:146)
  16. at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
  17. at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
  18. at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
  19. at org.apache.spark.scheduler.Task.run(Task.scala:123)
  20. at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
  21. at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
  22. at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
  23. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  24. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  25. at java.lang.Thread.run(Thread.java:748)
  26. Caused by: java.lang.RuntimeException: Cannot find constructor for interface org.apache.parquet.column.page.PageWriteStore
  27. Missing org.apache.parquet.hadoop.ColumnChunkPageWriteStore(org.apache.parquet.hadoop.CodecFactory$BytesCompressor,org.apache.parquet.schema.MessageType,org.apache.parquet.bytes.ByteBufferAllocator,int) [java.lang.NoSuchMethodException: org.apache.parquet.hadoop.ColumnChunkPageWriteStore.<init>(org.apache.parquet.hadoop.CodecFactory$BytesCompressor, org.apache.parquet.schema.MessageType, org.apache.parquet.bytes.ByteBufferAllocator, int)]
  28. at org.apache.iceberg.common.DynConstructors$Builder.build(DynConstructors.java:235)
  29. at org.apache.iceberg.parquet.ParquetWriter.<clinit>(ParquetWriter.java:55)
  30. ... 19 more
  31. 20/11/18 15:51:52 ERROR DataWritingSparkTask: Aborting commit for partition 0 (task 2, attempt 0, stage 2.0)
  32. 20/11/18 15:51:52 ERROR DataWritingSparkTask: Aborted commit for partition 0 (task 2, attempt 0, stage 2.0)

这是我的pom.xml:

  1. <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  2. <modelVersion>4.0.0</modelVersion>
  3. <groupId>icebergtest</groupId>
  4. <artifactId>icebergtest1</artifactId>
  5. <version>0.0.1-SNAPSHOT</version>
  6. <packaging>jar</packaging>
  7. <name>icebergtest1</name>
  8. <url>http://maven.apache.org</url>
  9. <properties>
  10. <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  11. <iceberg.version>0.9.1</iceberg.version>
  12. <hadoop.version>2.7.0</hadoop.version>
  13. <maven.compiler.source>1.8</maven.compiler.source>
  14. <maven.compiler.target>1.8</maven.compiler.target>
  15. </properties>
  16. <dependencies>
  17. <dependency>
  18. <groupId>junit</groupId>
  19. <artifactId>junit</artifactId>
  20. <version>3.8.1</version>
  21. <scope>test</scope>
  22. </dependency>
  23. <!-- org.apache.hadoop BEGIN-->
  24. <dependency>
  25. <groupId>org.apache.hadoop</groupId>
  26. <artifactId>hadoop-common</artifactId>
  27. <version>${hadoop.version}</version>
  28. </dependency>
  29. <dependency>
  30. <groupId>org.apache.hadoop</groupId>
  31. <artifactId>hadoop-hdfs</artifactId>
  32. <version>${hadoop.version}</version>
  33. </dependency>
  34. <dependency>
  35. <groupId>org.apache.hadoop</groupId>
  36. <artifactId>hadoop-client</artifactId>
  37. <version>${hadoop.version}</version>
  38. <!--将netty包排除-->
  39. <exclusions>
  40. <exclusion>
  41. <groupId>io.netty</groupId>
  42. <artifactId>netty</artifactId>
  43. </exclusion>
  44. </exclusions>
  45. </dependency>
  46. <!--解决io.netty.buffer.PooledByteBufAllocator.defaultNumHeapArena()I异常,-->
  47. <dependency>
  48. <groupId>io.netty</groupId>
  49. <artifactId>netty-all</artifactId>
  50. <version>4.1.18.Final</version>
  51. </dependency>
  52. <dependency>
  53. <groupId>org.apache.hadoop</groupId>
  54. <artifactId>hadoop-mapreduce-client-core</artifactId>
  55. <version>${hadoop.version}</version>
  56. </dependency>
  57. <dependency>
  58. <groupId>org.apache.hadoop</groupId>
  59. <artifactId>hadoop-auth</artifactId>
  60. <version>${hadoop.version}</version>
  61. </dependency>
  62. <dependency>
  63. <groupId>org.apache.hadoop</groupId>
  64. <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
  65. <version>${hadoop.version}</version>
  66. </dependency>
  67. <!-- org.apache.hadoop END-->
  68. <!-- org.apache.iceberg BEGIN-->
  69. <dependency>
  70. <groupId>org.apache.iceberg</groupId>
  71. <artifactId>iceberg-core</artifactId>
  72. <version>${iceberg.version}</version>
  73. </dependency>
  74. <dependency>
  75. <groupId>org.apache.iceberg</groupId>
  76. <artifactId>iceberg-api</artifactId>
  77. <version>${iceberg.version}</version>
  78. </dependency>
  79. <dependency>
  80. <groupId>org.apache.iceberg</groupId>
  81. <artifactId>iceberg-parquet</artifactId>
  82. <version>${iceberg.version}</version>
  83. </dependency>
  84. <dependency>
  85. <groupId>org.apache.iceberg</groupId>
  86. <artifactId>iceberg-common</artifactId>
  87. <version>${iceberg.version}</version>
  88. </dependency>
  89. <dependency>
  90. <groupId>org.apache.iceberg</groupId>
  91. <artifactId>iceberg-orc</artifactId>
  92. <version>${iceberg.version}</version>
  93. </dependency>
  94. <dependency>
  95. <groupId>org.apache.iceberg</groupId>
  96. <artifactId>iceberg-data</artifactId>
  97. <version>${iceberg.version}</version>
  98. </dependency>
  99. <dependency>
  100. <groupId>org.apache.iceberg</groupId>
  101. <artifactId>iceberg-hive</artifactId>
  102. <version>${iceberg.version}</version>
  103. </dependency>
  104. <dependency>
  105. <groupId>org.apache.iceberg</groupId>
  106. <artifactId>iceberg-arrow</artifactId>
  107. <version>${iceberg.version}</version>
  108. </dependency>
  109. <dependency>
  110. <groupId>org.apache.iceberg</groupId>
  111. <artifactId>iceberg-spark</artifactId>
  112. <version>${iceberg.version}</version>
  113. </dependency>
  114. <dependency>
  115. <groupId>org.apache.iceberg</groupId>
  116. <artifactId>iceberg-bundled-guava</artifactId>
  117. <version>${iceberg.version}</version>
  118. </dependency>
  119. <dependency>
  120. <groupId>org.apache.iceberg</groupId>
  121. <artifactId>iceberg-spark-runtime</artifactId>
  122. <version>${iceberg.version}</version>
  123. </dependency>
  124. <dependency>
  125. <groupId>org.apache.iceberg</groupId>
  126. <artifactId>iceberg-spark2</artifactId>
  127. <version>${iceberg.version}</version>
  128. </dependency>
  129. <dependency>
  130. <groupId>org.apache.iceberg</groupId>
  131. <artifactId>iceberg-flink</artifactId>
  132. <version>${iceberg.version}</version>
  133. </dependency>
  134. <dependency>
  135. <groupId>org.apache.iceberg</groupId>
  136. <artifactId>iceberg-pig</artifactId>
  137. <version>${iceberg.version}</version>
  138. </dependency>
  139. <dependency>
  140. <groupId>org.apache.iceberg</groupId>
  141. <artifactId>iceberg-mr</artifactId>
  142. <version>${iceberg.version}</version>
  143. </dependency>
  144. <!-- org.apache.iceberg END-->
  145. <dependency>
  146. <groupId>org.apache.spark</groupId>
  147. <artifactId>spark-sql_2.11</artifactId>
  148. <version>2.4.5</version>
  149. </dependency>
  150. <dependency>
  151. <groupId>org.apache.spark</groupId>
  152. <artifactId>spark-core_2.11</artifactId>
  153. <version>2.4.5</version>
  154. </dependency>
  155. <dependency>
  156. <groupId>org.apache.spark</groupId>
  157. <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
  158. <version>2.4.5</version>
  159. </dependency>
  160. <dependency>
  161. <groupId>org.apache.spark</groupId>
  162. <artifactId>spark-mllib_2.11</artifactId>
  163. <version>2.4.5</version>
  164. <exclusions>
  165. <exclusion>
  166. <groupId>org.codehaus.janino</groupId>
  167. <artifactId>commons-compiler</artifactId>
  168. </exclusion>
  169. </exclusions>
  170. </dependency>
  171. <dependency>
  172. <groupId>org.scala-lang</groupId>
  173. <artifactId>scala-library</artifactId>
  174. <version>2.11.0</version>
  175. </dependency>
  176. <dependency>
  177. <groupId>org.scala-lang</groupId>
  178. <artifactId>scala-compiler</artifactId>
  179. <version>2.11.0</version>
  180. </dependency>
  181. <dependency>
  182. <groupId>org.scala-lang</groupId>
  183. <artifactId>scala-reflect</artifactId>
  184. <version>2.11.0</version>
  185. </dependency>
  186. <!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-databind -->
  187. <dependency>
  188. <groupId>com.fasterxml.jackson.core</groupId>
  189. <artifactId>jackson-core</artifactId>
  190. <!--<version>2.7.9</version>-->
  191. <version>2.6.6</version>
  192. </dependency>
  193. <dependency>
  194. <groupId>com.fasterxml.jackson.core</groupId>
  195. <artifactId>jackson-databind</artifactId>
  196. <!--<version>2.7.9.4</version>-->
  197. <version>2.6.5</version>
  198. </dependency>
  199. <dependency>
  200. <groupId>com.fasterxml.jackson.core</groupId>
  201. <artifactId>jackson-annotations</artifactId>
  202. <!--<version>2.7.9</version>-->
  203. <version>2.6.5</version>
  204. </dependency>
  205. <!-- https://mvnrepository.com/artifact/com.alibaba/fastjson -->
  206. <dependency>
  207. <groupId>com.alibaba</groupId>
  208. <artifactId>fastjson</artifactId>
  209. <version>1.2.56</version>
  210. </dependency>
  211. <dependency>
  212. <groupId>org.apache.parquet</groupId>
  213. <artifactId>parquet-avro</artifactId>
  214. <version>1.11.1</version>
  215. </dependency>
  216. <dependency>
  217. <groupId>org.apache.avro</groupId>
  218. <artifactId>avro</artifactId>
  219. <version>1.10.0</version>
  220. </dependency>
  221. <dependency>
  222. <groupId>org.apache.parquet</groupId>
  223. <artifactId>parquet-column</artifactId>
  224. <version>1.11.1</version>
  225. </dependency>
  226. <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
  227. <dependency>
  228. <groupId>org.apache.spark</groupId>
  229. <artifactId>spark-hive_2.11</artifactId>
  230. <version>2.4.0</version>
  231. <scope>provided</scope>
  232. </dependency>
  233. </dependencies>
  234. </project>
bfrts1fy

bfrts1fy1#

缺少org.apache.parquet.hadoop.columnchunkpagewritestore(org.apache.parquet.hadoop.codecfactory$bytescompressor,org.apache.parquet.schema.messagetype,org.apache.parquet.bytes.bytebufferallocator,int)[java.lang.nosuchmethodexception:org.apache.parquet.hadoop.columnchunkpagewritestore.(org.apache.parquet.hadoop.codecfactory$bytescompressor,org.apache.parquet.schema.messagetype,org.apache.parquet.bytes.bytebufferallocator,int)]
意味着您正在使用columnchunkpagewritestore的构造函数,它接受4个类型的参数(org.apache.parquet.hadoop.codefactory$bytescompressor、org.apache.parquet.schema.messagetype、org.apache.parquet.bytes.bytebufferallocator、int)
它找不到您正在使用的构造函数。这就是为什么没有这样的错误
根据https://jar-download.com/artifacts/org.apache.parquet/parquet-hadoop/1.8.1/source-code/org/apache/parquet/hadoop/columnchunkpagewritestore.java ,你需要1.8.1的Parquethadoop
将mvn导入更改为旧版本。我查看了1.8.1源代码,它有您需要的正确构造函数。

相关问题