我在3.6版本的azurehdinsight集群上工作。它使用HortonWorksHDP2.6,这是Hive2.1.0(在Tez0.8.4上)附带的。
我有一些内部配置单元表,其中嵌套的结构字段以avro格式存储。下面是create语句的一个示例:
CREATE TABLE my_example_table(
some_field STRING,
some_other_field STRING,
some_struct struct<field1: BIGINT, inner_struct struct<field2: STRING, field3: STRING>>)
PARTITIONED BY (year INT, month INT)
STORED AS AVRO;
我用一个外部表填充这些表,该表也存储为avro,如下所示:
INSERT INTO TABLE my_example_table
PARTITION (year, month)
SELECT ....
FROM my_external_table;
当我想查询内部表时,出现以下错误: Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found core.record_0, expecting union
我用avro工具从其中一个内部表中提取了avro模式,并认识到hive从我定义的结构创建联合类型:
{
"type" : "record",
"name" : "my_example_table",
"namespace" : "my_namespace",
"fields" : [ {
"name" : "some_field",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "some_other_field",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "my_struct",
"type" : [ "null", {
"type" : "record",
"name" : "record_0",
"namespace" : "",
"doc" : "struct<field1: BIGINT, struct<field2: STRING, field3: STRING>>",
"fields" : [ {
"name" : "field1",
"type" : [ "null", "long" ],
"doc" : "bigint",
"default" : null
}, {
"name" : "inner_struct",
"type" : [ "null", {
"type" : "record",
"name" : "record_2",
"namespace" : "",
"doc" : "struct<field2: STRING, field3: STRING>",
"fields" : [ {
"name" : "field2",
"type" : [ "null", "string" ],
"doc" : "bigint",
"default" : null
}, {
"name" : "field2",
"type" : [ "null", "long" ],
"doc" : "bigint",
"default" : null
}]
}
]}
]}
]}
}
这里怎么了?我很确定这在几天前确实有效,所以我猜测微软为hdinsight clusters换了另一个补丁版本hdp,它有另一个avro或hive版本,但我没有发现任何迹象。
我发现这个:https://issues.apache.org/jira/browse/hive-15316 这似乎是非常类似的问题(在同一个Hive版本)。
有人知道这里出了什么问题,我能做些什么来解决这个问题或作为一个解决办法吗?
暂无答案!
目前还没有任何答案,快来回答吧!