按一个组合进行分组,并再次将其分组以用于其他项

dvtswwa3  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(277)

各位,
我们有以下数据,我们需要以下输出。

CUSTOMER_NAME PRODUCT_NAME PRICE OCCURANCE ID
 customer1,    product1,    20,       1
 customer1,    product2,    30,       2
 customer1,    product1,    25,       3
 customer1,    product1,    20,       1
 customer1,    product2,    20,       2
 customer1,    product2,    30,       2

首先我们需要按事件id平均价格。

customer1,product1,20 (AVG is 20 for occurance 1), 1
 customer1,product1,25 (AVG is 25 for occurance 3) , 3

现在,我们必须再次按客户名称、产品名称对其进行平均(发生率在group by中忽略)
最终输出客户1,产品1,所有发生的平均价格。

customer1,product1, 20 + 25/2 = 22.5

基本上怎么做 hive 里的平均数?我们不能为此写任何东西。

xytpbqjk

xytpbqjk1#

嗨,这可以使用嵌套查询实现,如下所示:
第一步:根据事件id计算价格的初始平均值

SELECT customer_name, product_name,occurance_id, avg(price) as avg_of_current_occurance
FROM customer_info
GROUP BY customer_name,product_name,occurance_id ;

第二步:计算第一步返回的平均值的平均值

hive (default)>
              > SELECT customer_name, product_name,avg(avg_of_current_occurance) as final_avg
              > FROM(
              > SELECT customer_name, product_name,occurance_id, avg(price) as avg_of_current_occurance
              > FROM customer_info
              > GROUP BY customer_name,product_name,occurance_id
              > ) W
              > GROUP BY customer_name,product_name;

Total MapReduce jobs = 1
Launching Job 1 out of 1

Execution completed successfully

customer_name   product_name    final_avg
customer1       product1        22.5
customer1       product2        26.666666666666668

相关问题