sql—计算表中不同部分的多个平均值?

t0ybt7op  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(580)

我有以下几点 transactions 表格:

customer_id purchase_date   product         category        department      quantity    store_id
    1       2020-10-01      Kit Kat         Candy           Food                2       store_A
    1       2020-10-01      Snickers        Candy           Food                1       store_A
    1       2020-10-01      Snickers        Candy           Food                1       store_A
    2       2020-10-01      Snickers        Candy           Food                2       store_A
    2       2020-10-01      Baguette        Bread           Food                5       store_A
    2       2020-10-01      iPhone          Cell phones     Electronics         2       store_A
    3       2020-10-01      Sony PS5        Games           Electronics         1       store_A

我想计算购买产品的平均数量(每个 product 在table上)。我还想计算每个项目的平均值 category 以及每个 department 通过核算同一产品范围内的所有产品 category 或者 department 分别。应注意区分独特的客户和产品 quantity 大于0(0表示退款,不应计入)。
基本上,输出表如下所示:

…在哪里 store_id 以及 average_level_type 是分区列。
有没有一种方法可以在事务表的一次传递中实现这一点?或者我需要将我的方法分解为多个步骤?
谢谢!

yxyvkwin

yxyvkwin1#

用下面的“联合所有”怎么样-

Select store_id, 'product' as average_level_type,product as id, sum(quantity) as total_quantity,
Count(distinct customer_id) as unique_customer_count, sum(quantity)/count(distinct customer_id) as average
from transactions
where quantity > 0
group by store_id,product
Union all
Select store_id, 'category' as average_level_type, category as id, sum(quantity) as total_quantity,
Count(distinct customer_id) as unique_customer_count, sum(quantity)/count(distinct customer_id) as average
from transactions
where quantity > 0
group by store_id,category
Union all
Select store_id, 'department' as average_level_type,department as id, sum(quantity) as total_quantity,
Count(distinct customer_id) as unique_customer_count, sum(quantity)/count(distinct customer_id) as average
from transactions
where quantity > 0
group by store_id,department;

如果要避免在这种情况下使用union all,可以使用类似rollup()或group by grouping sets()的方法来实现相同的效果,但要以问题中显示的格式获得输出,查询会稍微复杂一些。
编辑:下面是如何使用分组集获得相同的输出-

Select store_id,
       case when G_ID = 3 then 'product' 
            when G_ID = 5 then 'category'
            when G_ID = 6 then 'department' end As average_level_type,
       case when G_ID = 3 then product 
            when G_ID = 5 then category
            when G_ID = 6 then department end As id,
       total_quantity,
       unique_customer_count,
       average
from            
    (select store_id, product, category, department, sum(quantity) as total_quantity, Count(distinct customer_id) as unique_customer_count, sum(quantity)/count(distinct customer_id) as average, GROUPING__ID As G_ID
    from transactions
    group by store_id,product,category,department
    grouping sets((store_id,product),(store_id,category),(store_id,department))
    ) Tab
order by 2    
;

相关问题