hive在emr5.19和hive-2.3.3中返回空值

我试图在emr 5.19的hive-2.3.3中查询一个表，得到的输出是空值：

hive> select * from ip_sandbox_dev.master_schedule limit 5 ;
OK
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
Time taken: 2.067 seconds, Fetched: 5 row(s)

但是，当我从emr-5.4 hive 2.1.1查询同一个表时，我得到了预期的结果：

OK
THURSDAY        ABQ     ABC     3       4       ABQABC3 MIDWEST TRUCK & AUTO PARTS      18      14      Penny Mayfield  N
TUESDAY ABQ     ABC     0       4       ABQABC0 RANGER BRAKE PRODUCTS   15      14      Penny Mayfield  N
TUESDAY ABQ     ABC     1       4       ABQABC1 RANGER BRAKE PRODUCTS   15      14      Penny Mayfield  N
TUESDAY ABQ     ABC     2       4       ABQABC2 RANGER BRAKE PRODUCTS   15      14      Penny Mayfield  N
TUESDAY ANC     ABC     0       8       ANCABC0 RANGER BRAKE PRODUCTS   27      14      Penny Mayfield  N
Time taken: 2.022 seconds, Fetched: 5 row(s)

显示创建表的结果：

CREATE EXTERNAL TABLE `ip_sandbox_dev.master_schedule`(
  `schedule_day` string,
  `dc` string,
  `mfg` string,
  `subline` int,
  `weeks` int,
  `con` string,
  `supplier` string,
  `leadtime` int,
  `buyer` int,
  `buyer_name` string,
  `optimize_flag` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  's3a://aap-warehouse-default-dev/ip_sandbox.db/master_schedule'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
  'numFiles'='2',
  'numRows'='59329',
  'rawDataSize'='38922302',
  'totalSize'='658865',
  'transient_lastDdlTime'='1569395007')

我不知道为什么结果会有这种差异。我试图删除并重新创建表，但得到相同的结果。
以下是my hive.log：

2019-10-11T08:25:55,404 ERROR [ORC_GET_SPLITS #0([])]: io.AcidUtils (AcidUtils.java:getAcidState(791)) - Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.s3a.S3AFileSystem
2019-10-11T08:25:55,411 INFO  [5b9e417b-8008-4fd2-b3c7-987fec297d63 main([])]: orc.OrcInputFormat (OrcInputFormat.java:getDesiredRowTypeDescr(2463)) - Using schema evolution configuration variables schema.evolution.columns [schedule_day, dc, mfg, subline, weeks, con, supplier, leadtime, buyer, buyer_name, optimize_flag] / schema.evolution.columns.types [string, string, string, int, int, string, string, int, int, string, string] (isAcidRead false)
2019-10-11T08:25:55,487 INFO  [5b9e417b-8008-4fd2-b3c7-987fec297d63 main([])]: orc.OrcInputFormat (OrcInputFormat.java:generateSplitsInfo(1735)) - FooterCacheHitRatio: 0/2
2019-10-11T08:25:55,672 INFO  [5b9e417b-8008-4fd2-b3c7-987fec297d63 main([])]: orc.OrcInputFormat (OrcInputFormat.java:getDesiredRowTypeDescr(2463)) - Using schema evolution configuration variables schema.evolution.columns [schedule_day, dc, mfg, subline, weeks, con, supplier, leadtime, buyer, buyer_name, optimize_flag] / schema.evolution.columns.types [string, string, string, int, int, string, string, int, int, string, string] (isAcidRead false)
2019-10-11T08:25:55,673 INFO  [5b9e417b-8008-4fd2-b3c7-987fec297d63 main([])]: orc.ReaderImpl (ReaderImpl.java:rowsOptions(79)) - Reading ORC rows from s3a://aap-warehouse-default-dev/ip_sandbox.db/master_schedule/part-00000-8c46f760-a26b-4fe6-ba3b-fc4d2d0ef228-c000.orc with {include: [true, true, true, true, true, true, true, true, true, true, true, true], offset: 0, length: 648566, schema: struct<schedule_day:string,dc:string,mfg:string,subline:int,weeks:int,con:string,supplier:string,leadtime:int,buyer:int,buyer_name:string,optimize_flag:string>}
2019-10-11T08:25:55,786 WARN  [5b9e417b-8008-4fd2-b3c7-987fec297d63 main([])]: internal.S3AbortableInputStream (S3AbortableInputStream.java:close(178)) - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.

有人能帮我克服这个吗？

hive在emr5.19和hive-2.3.3中返回空值

暂无答案！

相关问题

热门标签

最新问答