HIve sql汇总后将缺失的行数设置为0

gblwokeq  于 2021-04-02  发布在  Hive
关注(0)|答案(1)|浏览(524)

我有一个产品和子产品的表格,看起来像这样,产品

| backend | sub_product | product |
|---------|-------------|---------|
| conn_go | go_update   | prod_go |
| conn_go | go_delete   | prod_go |
| conn_go | go_edit     | prod_go |
| conn_rv | rv_update   | prod_rv |
| conn_mb | mb_update   | prod_mb |
| conn_mb | mb_delete   | prod_mb |
| conn_mb | mb_edit     | prod_mb |
| conn_pr | pr_update   | prod_pr |
| conn_pr | pr_edit     | prod_pr |
| conn_ct | pol_edit    | null    |
....

然后,我有这个使用表,它记录了一个用户使用每个子产品的次数。

| backend | yyyy_mm_dd | sub_product | x_id | user_id | count |
|---------|------------|-------------|------|---------|-------|
| conn_go | 2020-12-15 | go_update   | 10   | 3422    | 1     |
| conn_go | 2020-12-15 | go_delete   | 10   | 23445   | 2     |
| conn_go | 2020-12-15 | go_edit     | 10   | 2243    | 2     |
| conn_rv | 2020-12-15 | rv_update   | 10   | 245342  | 1     |
| conn_mb | 2020-12-15 | mb_update   | 11   | 5464    | 3     |
| conn_mb | 2020-12-15 | mb_delete   | 11   | 1424    | 2     |
| conn_mb | 2020-12-15 | mb_edit     | 11   | 21454   | 2     |
| conn_pr | 2020-12-15 | pr_update   | 12   | 224525  | 1     |
| conn_pr | 2020-12-15 | pr_edit     | 12   | 22424   | 1     |

我想统计使用过每个产品的不同的user_ids,另外,我想在usage表中没有记录的情况下,将每个x_id未使用的产品s设置为0。

| x_id | product | usage |
|------|---------|-------|
| 10   | prod_go | 3     |
| 10   | prod_rv | 1     |
| 10   | prod_mb | 0     |
| 10   | prop_pr | 0     |
| 10   | null    | 0     |
| 11   | prod_go | 0     |
| 11   | prod_rv | 0     |
| 11   | prod_mb | 3     |
| 11   | prop_pr | 0     |
| 11   | null    | 0     |
| 12   | prod_go | 0     |
| 12   | prod_rv | 0     |
| 12   | prod_mb | 0     |
| 12   | prop_pr | 1     |
| 12   | null    | 1     |

我知道我可以根据 "x_id "和 "product "聚合不同的用户,就像这样。

select
    usage.x_id,
    p.product,
    count(distinct(usage.user_id)) as usage

from(
    select
        yyyy_mm_dd,
        x_id,
        backend,
        sub_product,
        user_id,
        count
    from
        my_schema.usage_table
    where
        yyyy_mm_dd between '2018-11-06' and '2020-01-20'
        and x_id is not null
) usage

inner join
    my_schema.products p
    on p.backend = usage.backend and p.sub_product = usage.sub_product

group by
    1,2

但我不知道如何将未使用的 "产品 "包括在内。

yrdbyhpb

yrdbyhpb1#

使用LEFT JOIN

select ut.x_id, p.product,
       count(distinct ut.user_id) as usage
from my_schema.products p left join
     my_schema.usage_table ut
     on p.backend = ut.backend and
        p.sub_product = ut.sub_product and
        ut.yyyy_mm_dd between '2018-11-06' and '2020-01-20' and
        ut.x_id is not null
group by 1, 2;

我还简化了查询,所以它不需要子查询。

相关问题