我试图在emr 5.19的hive-2.3.3中查询一个表,得到的输出是空值:
hive> select * from ip_sandbox_dev.master_schedule limit 5 ;
OK
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
Time taken: 2.067 seconds, Fetched: 5 row(s)
但是,当我从emr-5.4 hive 2.1.1查询同一个表时,我得到了预期的结果:
OK
THURSDAY ABQ ABC 3 4 ABQABC3 MIDWEST TRUCK & AUTO PARTS 18 14 Penny Mayfield N
TUESDAY ABQ ABC 0 4 ABQABC0 RANGER BRAKE PRODUCTS 15 14 Penny Mayfield N
TUESDAY ABQ ABC 1 4 ABQABC1 RANGER BRAKE PRODUCTS 15 14 Penny Mayfield N
TUESDAY ABQ ABC 2 4 ABQABC2 RANGER BRAKE PRODUCTS 15 14 Penny Mayfield N
TUESDAY ANC ABC 0 8 ANCABC0 RANGER BRAKE PRODUCTS 27 14 Penny Mayfield N
Time taken: 2.022 seconds, Fetched: 5 row(s)
显示创建表的结果:
CREATE EXTERNAL TABLE `ip_sandbox_dev.master_schedule`(
`schedule_day` string,
`dc` string,
`mfg` string,
`subline` int,
`weeks` int,
`con` string,
`supplier` string,
`leadtime` int,
`buyer` int,
`buyer_name` string,
`optimize_flag` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
's3a://aap-warehouse-default-dev/ip_sandbox.db/master_schedule'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'numFiles'='2',
'numRows'='59329',
'rawDataSize'='38922302',
'totalSize'='658865',
'transient_lastDdlTime'='1569395007')
我不知道为什么结果会有这种差异。我试图删除并重新创建表,但得到相同的结果。
以下是my hive.log:
2019-10-11T08:25:55,404 ERROR [ORC_GET_SPLITS #0([])]: io.AcidUtils (AcidUtils.java:getAcidState(791)) - Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.s3a.S3AFileSystem
2019-10-11T08:25:55,411 INFO [5b9e417b-8008-4fd2-b3c7-987fec297d63 main([])]: orc.OrcInputFormat (OrcInputFormat.java:getDesiredRowTypeDescr(2463)) - Using schema evolution configuration variables schema.evolution.columns [schedule_day, dc, mfg, subline, weeks, con, supplier, leadtime, buyer, buyer_name, optimize_flag] / schema.evolution.columns.types [string, string, string, int, int, string, string, int, int, string, string] (isAcidRead false)
2019-10-11T08:25:55,487 INFO [5b9e417b-8008-4fd2-b3c7-987fec297d63 main([])]: orc.OrcInputFormat (OrcInputFormat.java:generateSplitsInfo(1735)) - FooterCacheHitRatio: 0/2
2019-10-11T08:25:55,672 INFO [5b9e417b-8008-4fd2-b3c7-987fec297d63 main([])]: orc.OrcInputFormat (OrcInputFormat.java:getDesiredRowTypeDescr(2463)) - Using schema evolution configuration variables schema.evolution.columns [schedule_day, dc, mfg, subline, weeks, con, supplier, leadtime, buyer, buyer_name, optimize_flag] / schema.evolution.columns.types [string, string, string, int, int, string, string, int, int, string, string] (isAcidRead false)
2019-10-11T08:25:55,673 INFO [5b9e417b-8008-4fd2-b3c7-987fec297d63 main([])]: orc.ReaderImpl (ReaderImpl.java:rowsOptions(79)) - Reading ORC rows from s3a://aap-warehouse-default-dev/ip_sandbox.db/master_schedule/part-00000-8c46f760-a26b-4fe6-ba3b-fc4d2d0ef228-c000.orc with {include: [true, true, true, true, true, true, true, true, true, true, true, true], offset: 0, length: 648566, schema: struct<schedule_day:string,dc:string,mfg:string,subline:int,weeks:int,con:string,supplier:string,leadtime:int,buyer:int,buyer_name:string,optimize_flag:string>}
2019-10-11T08:25:55,786 WARN [5b9e417b-8008-4fd2-b3c7-987fec297d63 main([])]: internal.S3AbortableInputStream (S3AbortableInputStream.java:close(178)) - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
有人能帮我克服这个吗?
暂无答案!
目前还没有任何答案,快来回答吧!