我试图通过计算分区中的行数、每天看到的“使用”数量以及每天看到的值数量来抽查数据。
我以前可以使用以下查询的早期版本,但我一定在没有意识到的情况下更改了某些内容:
src as
(
select partition_date_column, count(*) as src_row_count
from database.table
where partition_date_column > '2016-01-01'
group by partition_date_column
)
,
pst as
(
select timestamp_pst as datevalue, count(*) as timestamp_row_count
from database.table
where partition_date_column > '2016-01-01'
and timestamp_pst between '2016-01-01' and '2017-07-01'
group by timestamp_pst
),
users as
(
select timestamp_pst as user_datevalue, count(*) as user_count
from database.table
where partition_date_column > '2016-01-01'
and timestamp_pst between '2016-01-01' and '2017-07-01'
and filter_column in ('filterA', 'filterB')
group by timestamp_pst
)
select datevalue as dayval, src_row_count, timestamp_row_count, user_count
from pst
left join src
on datevalue = partition_date_column
left join users
on datevalue = user_datevalue
order by dayval;
我不清楚是什么格式错误,我做了导致Hive不承认这一点。我也觉得有更好的方法来计算这三个项目,即使其中一个是在不同的列分组。
2条答案
按热度按时间h9vpoimq1#
我想出来了。我在代码开头缺少了“with”,它允许像这样的多个select语句。
yk9xbfzb2#