我正在开发一个hadoop数据库,使用hive作为首选接口。我希望能够将几个select语句组合到一个查询中(有点像union,但每个查询填充一个不同的列)。下面的查询将在一个列中返回我需要的所有结果,但我希望能够使用每个查询填充一个单独的列。任何关于如何做到这一点的帮助都是非常棒的——某种等同于价值观的Hive可能会做到这一点。干杯。
INSERT OVERWRITE TABLE tstr_tmp SELECT * FROM
(SELECT time_stamp FROM http WHERE ext_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'http://lucy.info' ORDER BY time_stamp asc limit 1) as last_visit_of_day
UNION ALL
SELECT * FROM (SELECT CAST(COUNT(hr) as string) FROM http WHERE ext_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'http://lucy.info' group by ext_url) as n_hour_bins
UNION ALL
SELECT * FROM (SELECT time_stamp FROM http WHERE ext_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'http://lucy.info' ORDER BY time_stamp desc limit 1) as first_visit_of_day
UNION ALL
SELECT * FROM (SELECT ext_url FROM http WHERE ext_url = 'http://lucy.info' group by ext_url) as domain_name
UNION ALL
SELECT * FROM (SELECT CAST(count(*) as string) FROM http WHERE ext_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'http://lucy.info' group by ext_url) as n_http_requests
UNION ALL
SELECT * FROM (SELECT int_ip FROM http WHERE ext_hostname = 'exotichorse' group by int_ip) as internal_ip;
按照下面的请求,每个查询将返回一个字符串形式的值。对于这个特定的查询集,将返回以下结果:;
00:08:00
2
07:00:00
http://lucy.info
2
192.168.0.22
我正在开发一个数据库,它将告诉我有关用户流量的信息,所以这个子集将填充下表;
CREATE TABLE metric_http_domain_time_summary( last_visit_of_day string, n_hour_bins string, first_visit_of_day string, domain_name string, n_http_requests string, internal_ip string) PARTITIONED BY (dt string, hr string, origin string, cl string, st string);
我知道我需要对输入的数据进行分区,但我对此相当有信心,一旦我成功地运行了未分区的查询,就会对其进行编辑。我能力上的差距是将子查询串在一起以填充表。
1条答案
按热度按时间owfi6suc1#
在离开并思考了很长一段时间之后,我找到了答案。工会是没有必要的,实际上是在妨碍。此查询将根据需要返回上述输出。如果其他人也有同样的问题,我会把这个留着。由于堆栈溢出信誉的限制,我不得不去掉ext\uurl,但是这个概念可以用。