我正在hdp2群集上运行配置单元0.14。我的数据集是使用kitesdk构建的,并使用外部表注册到配置单元。
请参见下面的我的表格布局:
hive> describe hivetweets;
OK
created_at bigint from deserializer
id bigint from deserializer
in_reply_to_user_id bigint from deserializer
in_reply_to_status_id bigint from deserializer
lang string from deserializer
text string from deserializer
retweet_count int from deserializer
year int Partition column derived from 'created_at' column, generated by Kite.
month int Partition column derived from 'created_at' column, generated by Kite.
day int Partition column derived from 'created_at' column, generated by Kite.
hour int Partition column derived from 'created_at' column, generated by Kite.
# Partition Information
# col_name data_type comment
year int Partition column derived from 'created_at' column, generated by Kite.
month int Partition column derived from 'created_at' column, generated by Kite.
day int Partition column derived from 'created_at' column, generated by Kite.
hour int Partition column derived from 'created_at' column, generated by Kite.
Time taken: 0.15 seconds, Fetched: 19 row(s)
我对此设置的初始测试查询是仅获取数据集的一行(我删除了示例中的实际输出):
hive> select * from hivetweets limit 1;
OK
Time taken: 103.726 seconds, Fetched: 1 row(s)
104秒运行这个查询太长了。
这可能不是分布式的,所以我尝试用更多的数据来测试它:
hive> select count(*) from hivetweets limit 100000;
Query ID = root_20150715132222_81e386ef-2990-4251-a61f-82ca8da4c48d
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.
Status: Running (Executing on YARN cluster with App id application_1436910684121_0006)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 19 19 0 0 0 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 567.52 s
--------------------------------------------------------------------------------
OK
197371741
在10分钟内统计10万条记录是很长的一段时间。
我很高兴有任何建议如何调试这个。
暂无答案!
目前还没有任何答案,快来回答吧!