hive查询未返回任何数据

sycxhyv7  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(417)
CREATE EXTERNAL TABLE invoiceitems (
  InvoiceNo INT,
  StockCode INT,
  Description STRING,
  Quantity INT,
  InvoiceDate BIGINT,
  UnitPrice DOUBLE,
  CustomerID INT,
  Country STRING,
  LineNo INT,
  InvoiceTime STRING,
  StoreID INT,
  TransactionID STRING
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3a://streamingdata/data/*';

数据文件是由spark结构化流作业创建的:

...
data/part-00000-006fc42a-c6a1-42a2-af03-ae0c326b40bd-c000.json  7.1 KB  29/08/2018 10:27:32 PM  
data/part-00000-0075634b-8513-47b3-b5f8-19df8269cf9d-c000.json  1.3 KB  30/08/2018 10:47:32 AM  
data/part-00000-00b6b230-8bb3-49d1-a42e-ad768c1f9a94-c000.json  2.3 KB  30/08/2018 1:25:02 AM
...

以下是第一个文件的前几行:

{"InvoiceNo":5421462,"StockCode":22426,"Description":"ENAMEL WASH BOWL CREAM","Quantity":8,"InvoiceDate":1535578020000,"UnitPrice":3.75,"CustomerID":13405,"Country":"United Kingdom","LineNo":6,"InvoiceTime":"21:27:00","StoreID":0,"TransactionID":"542146260180829"}
{"InvoiceNo":5501932,"StockCode":22170,"Description":"PICTURE FRAME WOOD TRIPLE PORTRAIT","Quantity":4,"InvoiceDate":1535578020000,"UnitPrice":6.75,"CustomerID":13952,"Country":"United Kingdom","LineNo":26,"InvoiceTime":"21:27:00","StoreID":0,"TransactionID":"5501932260180829"}

但是,如果运行查询,则不会返回任何数据:

hive> select * from invoiceitems limit 5;
OK
Time taken: 24.127 seconds

配置单元的日志文件为空:

$ ls /var/log/hive*
/var/log/hive:

/var/log/hive-hcatalog:

/var/log/hive2:

如何进一步调试?

dzjeubhm

dzjeubhm1#

我在运行时收到了更多关于错误的提示:

select count(*) from invoiceitems;

这返回了以下错误
...
由于vertex\U失败,dag未成功。失败dvertices:1 killedvertices:1失败:执行错误,从org.apache.hadoop.hive.ql.exec.tez.teztask返回代码2。vertex失败,vertexname=map 1,vertexid=vertex\u 1535521291031\u 0011\u 00,diagnostics=[vertex vertex\u 1535521291031\u 0011\u 00[map 1]已终止/失败,原因是:根输入初始化失败,vertex输入:invoiceitems初始值设定项失败,vertex=vertex\u 1535521291031\u 0011\u 00[map 1],java.io.ioexception:在pathtPartitionInfo:[s3a://streamingdata/data/part-00000-006fc42a-c6a1-42a2-af03-ae0c326b40bd-c000.json]中找不到dir=s3a://streamingdata/data/part-00000-006fc42a-c6a1-42a2
我决定将create table定义从:

LOCATION 's3a://streamingdata/data/*';

LOCATION 's3a://streamingdata/data/';

这解决了问题。

相关问题