在java中将json转换为parquet

lsmepo6l  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(1072)

我正在尝试将json转换为java中的parquet格式,但遇到了一个异常。
输入json:

  1. {"list": [ {"mainBearingX": 0.178334,
  2. "gearBoxZ": 0.03885,
  3. "_t": 1560305236290000,
  4. "mainBearingZ": 0.034438,
  5. "gearBoxX": 0.035738,
  6. "mainBearingY": 0.029445,
  7. "gearBoxY": 0.040929,
  8. "generatorX": 0.776837,
  9. "generatorY": 0.124234,
  10. "ts_id":"t1"
  11. },
  12. {"mainBearingX": 0.169478,
  13. "gearBoxZ": 0.008242,
  14. "_t": 1560305236311000,
  15. "mainBearingZ": 0.007531,
  16. "gearBoxX": 0.025647,
  17. "mainBearingY": 0.029445,
  18. "gearBoxY": 0.026282,
  19. "generatorX": 0.770189,
  20. "generatorY": 0.117464,
  21. "ts_id": "t1"
  22. }
  23. ]
  24. }

代码:

  1. public static void toConvert(OutPut output) {
  2. String inputFile = "test.parquetFile";
  3. Path dataFile = new Path(inputFile);
  4. Schema schema = ReflectData.AllowNull.get().getSchema(OutPut.class);
  5. try (ParquetWriter<OutPut> writer = AvroParquetWriter.<OutPut>builder(dataFile)
  6. .withSchema(schema)
  7. .withDataModel(ReflectData.get())
  8. .withConf(new Configuration())
  9. .withCompressionCodec(CompressionCodecName.SNAPPY)
  10. .withWriteMode(Mode.OVERWRITE)
  11. .build()) {
  12. } catch (IOException e) {
  13. e.printStackTrace();
  14. }
  15. public class OutPut {
  16. List<Map<String, Object>> list;
  17. }

例外情况:

  1. Exception in thread "main" org.apache.parquet.schema.InvalidSchemaException: Cannot write a schema with an empty group: required group value {}
  2. at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:27)
  3. at org.apache.parquet.schema.GroupType.accept(GroupType.java:226)
  4. at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:31)
  5. at org.apache.parquet.schema.GroupType.accept(GroupType.java:226)
  6. at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:31)
  7. at org.apache.parquet.schema.GroupType.accept(GroupType.java:226)
  8. at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:31)
  9. at org.apache.parquet.schema.GroupType.accept(GroupType.java:226)
  10. at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:31)
  11. at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:37)
  12. at org.apache.parquet.schema.MessageType.accept(MessageType.java:55)
  13. at org.apache.parquet.schema.TypeUtil.checkValidWriteSchema(TypeUtil.java:23)
  14. at org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:228)
  15. at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:273)
  16. at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:494)
6ss1mwsb

6ss1mwsb1#

问题是你的 OutPut 类型包含类型 Object 作为 Map :

  1. public class OutPut {
  2. List<Map<String, Object>> list;
  3. }

你用的是 ReflectData 通过反省来推断你的类型的avro模式。然而,它不能从这些数据中推断出任何有用的东西 Object 类型。
如果你改变了你对 OutPut 使用混凝土类型,例如:

  1. public class OutPut {
  2. List<Map<String, Double>> list;
  3. }

那就行了。

相关问题