在特定列上选择distinct，但在配置单元中也选择其他列

oewdyzsn 于 2021-06-02 发布在 Hadoop

关注(0)|答案(2)|浏览(395)

我在配置单元的一个表中有多个列，大约有80列。我需要在一些列上应用distinct子句，并从其他列中获取第一个值。下面是我想要达到的目标。

select distinct(col1,col2,col3),col5,col6,col7
from abc where col1 = 'something';

上面提到的所有列都是文本列。所以我不能应用groupby和aggregate函数。

hadoop Hive Distinct hql

来源：https://stackoverflow.com/questions/46733514/select-distinct-on-specific-columns-but-select-other-columns-also-in-hive

2条答案

按热度按时间

kulphzqa1#

你可以用 row_number 函数来解决问题。

create table temp as
select *, row_number() over (partition by col1,col2,col3) as rn
from abc 
where col1 = 'something';

select *
from temp
where rn=1

也可以在分区时对表进行排序。 row_number() over (partition by col1,col2,col3 order by col4 asc) as rn

赞(0）回复(0）举报 2021-06-02

pxyaymoc2#

distinct是sql中使用最多、理解最少的函数。它是在整个结果集中执行的最后一个操作，并使用select中的所有列删除重复项。你可以用一个字符串来分组，事实上这就是答案：

SELECT col1,col2,col3,COLLECT_SET(col4),COLLECT_SET(col5),COLLECT_SET(col6)
FROM abc WHERE col1 = 'something'
GROUP BY col1,col2,col3;

不过，现在我又读了你的问题，我真的不确定你到底在找什么。您可能需要将表连接到其自身的聚合。

赞(0）回复(0）举报 2021-06-02

我来回答

在特定列上选择distinct，但在配置单元中也选择其他列

2条答案

相关问题

热门标签

最新问答