规范化配置单元中的横向爆炸

4uqofj5v  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(298)

我有一个在字段中有数组列表的表。我在这张table上做横向爆炸来获取元素。但这样做,数值也会成倍增加。

Table:

Sitedomain      Keyword                             Clicks

msn.com         sports,cricket,accessories           100
yahoo.com       fashion,accessories                   50

一旦执行横向爆炸,我的输出就像

Sitedomain     Keyword       Clicks

 msn.com        sports        100
 msn.com        cricket       100
 msn.com        accessories   100
 yahoo.com      fashion        50
 yahoo.com      accessories    50

如你所见,地铁也在爆炸。是否有任何方法来规范化这些数据,以便将度量除以数组中的元素数?所以输出看起来像

Sitedomain     Keyword       Clicks

  msn.com        sports        33.3
  msn.com        cricket       33.3
  msn.com        accessories   33.3
  yahoo.com      fashion        25
  yahoo.com      accessories    25
3yhwsihp

3yhwsihp1#

按关键字数组大小除以单击次数:

with your_table as(
select stack(2,
'msn.com',         'sports,cricket,accessories',           100,
'yahoo.com',       'fashion,accessories',                   50
) as (Sitedomain,Keyword,Clicks)
)

select Sitedomain,k.Keyword,round(s.Clicks/size(Keyword_aray),1) as Clicks
from
(
select Sitedomain,
       split(Keyword,',')  Keyword_aray, 
       Clicks
  from your_table
)s  lateral view explode(Keyword_aray) k as keyword
;

退货:

msn.com         sports          33.3
msn.com         cricket         33.3
msn.com         accessories     33.3
yahoo.com       fashion         25.0
yahoo.com       accessories     25.0

我补充道 round() 要获得像您的示例中那样的精度,请在不需要时删除它。

相关问题