sql查询计算配置单元中的频率

7uzetpgm  于 2021-05-29  发布在  Hadoop
关注(0)|答案(3)|浏览(358)

我有一张table tab 在这样的Hive中:

word | occurrences  
---- | -----------  
by   | 10
hi   | 1
same | 3
love | 6

我想使用配置单元查询来计算和显示单词的频率(出现次数除以整列的总和)。例如,单词“by”的频率为10/(10+1+3+6)=0.5。
我试过这个:

SELECT word, occurrences, occurrences/SUM(occurrences) AS frequency
FROM tab
GROUP BY word, occurrences
ORDER BY frequency;

但它给出了:

word | occurrences | frequency
---- | ----------- | ---------
by   | 10          | 1
hi   | 1           | 1
same | 3           | 1
love | 6           | 1

我不确定我做错了什么。我的sql不是很好。提前谢谢。

ct2axkht

ct2axkht1#

你不需要这么做 GROUP BY 任何一列,因为您希望得到分母的所有频率的总和。

SELECT a.word, a.occurrences, a.occurrences/b.total_freq AS frequency
FROM 
tab a CROSS JOIN (SELECT SUM(occurences) AS total_freq from tab) b
ORDER BY frequency;

通过交叉连接,可以使用 total_freq 对于所有的行 tab 表,然后在外部查询中将其用作分母。

chhqkbe1

chhqkbe12#

with a1 as

(

SELECT word, occurrences, occurrences/SUM(occurrences) OVER() AS frequency
FROM tab
ORDER BY frequency

)

select * from a1
3ks5zfa0

3ks5zfa03#

尝试下面的sql,使用 SUM() OVER() 在这里

SELECT word, occurrences, occurrences/SUM(occurrences) OVER() AS frequency
FROM tab
ORDER BY frequency;

相关问题