sql—运行多个计数并合并结果

daolsyd0  于 2021-06-26  发布在  Hive
关注(0)|答案(2)|浏览(320)

我试图通过计算分区中的行数、每天看到的“使用”数量以及每天看到的值数量来抽查数据。
我以前可以使用以下查询的早期版本,但我一定在没有意识到的情况下更改了某些内容:

src as
(
   select partition_date_column, count(*) as src_row_count
   from database.table
   where partition_date_column > '2016-01-01' 
   group by partition_date_column
)

,
pst as
(
  select timestamp_pst as datevalue, count(*) as timestamp_row_count
  from database.table
  where partition_date_column > '2016-01-01'
  and timestamp_pst between '2016-01-01' and '2017-07-01'
  group by timestamp_pst
),

users as
(
  select timestamp_pst as user_datevalue, count(*) as user_count
  from database.table
  where partition_date_column > '2016-01-01'
  and timestamp_pst between '2016-01-01' and '2017-07-01'
  and filter_column in ('filterA', 'filterB')
  group by timestamp_pst
)

select datevalue as dayval, src_row_count, timestamp_row_count, user_count
from pst
left join src
on datevalue = partition_date_column
left join users
on datevalue = user_datevalue
order by dayval;

我不清楚是什么格式错误,我做了导致Hive不承认这一点。我也觉得有更好的方法来计算这三个项目,即使其中一个是在不同的列分组。

h9vpoimq

h9vpoimq1#

我想出来了。我在代码开头缺少了“with”,它允许像这样的多个select语句。

With src as
(
   select partition_date_column, count(*) as src_row_count
   from database.table
   where partition_date_column > '2016-01-01' 
   group by partition_date_column
)

,
pst as
(
  select timestamp_pst as datevalue, count(*) as timestamp_row_count
  from database.table
  where partition_date_column > '2016-01-01'
  and timestamp_pst between '2016-01-01' and '2017-07-01'
  group by timestamp_pst
),

users as
(
  select timestamp_pst as user_datevalue, count(*) as user_count
  from database.table
  where partition_date_column > '2016-01-01'
  and timestamp_pst between '2016-01-01' and '2017-07-01'
  and filter_column in ('filterA', 'filterB')
  group by timestamp_pst
)

select datevalue as dayval, src_row_count, timestamp_row_count, user_count
from pst
left join src
on datevalue = partition_date_column
left join users
on datevalue = user_datevalue
order by dayval;
yk9xbfzb

yk9xbfzb2#

select      pe.val  as dt

           ,count(case when pe.pos = 0 then 1 end)  as src_row_count

           ,count
            (
                case  
                    when    pe.pos = 1 
                        and pe.val between date '2016-01-01' and date '2017-07-01' 
                    then    1 
                end
            ) as    timestamp_row_count 

           ,count
            (
                case  
                    when    pe.pos = 1 
                        and pe.val between date '2016-01-01' and date '2017-07-01' 
                        and filter_column in ('filterA', 'filterB')
                    then    1 
                end
            ) as    user_count

from        database.table  t
            lateral view posexplode (array(partition_date_column,timestamp_pst)) pe

where       partition_date_column > date '2016-01-01' 

group by    pe.val

相关问题