消除行sql配置单元中的重复项

5w9g7ksd  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(215)

我有这张table

+------------+---------------------+--+
|  country   |      commodity      |
+------------+---------------------+--+
| Argentina  | Copper, molybdenum  |
| Argentina  | Silver, lead        |
| Argentina  | Copper, gold        |
| Argentina  | Copper, gold        |
| Argentina  | Copper              |
| Spain      | Rhodochrosite       |
| Spain      | Copper              |
| Spain      | Limestone           |
| Spain      | Gold                |
| Spain      | Limestone           |
+------------+---------------------+--+

我想展示这个

+------------+-----------------------------------------+--+
|  country   |                   minerals              |
+------------+-----------------------------------------+--+
| Argentina  | copper, molybdenum, silver, lead, gold  |
| Spain      | rhodochrosite, copper, limestone, gold  |
+------------+-----------------------------------------+--+

所以我想把每个国家的所有商品都放在一个“矿物”栏中,并消除重复项,但在原来的“商品”栏中,可以有1种以上的矿物,如你在第一个表格中看到的,也可以有小写或大写的黄金、黄金等。
我试过了

SELECT country, CONCAT_WS(', ' ,COLLECT_SET(LOWER(commodity))) as minerals 
FROM depositOPT 
GROUP BY country;

但它没有消除重复项,因为输出如下所示

+------------+------------------------------------------------------------------------
    |  country   |                   minerals                  
    +------------+------------------------------------------------------------------------
    | Argentina  | copper, molybdenum, silver, lead, copper, gold, copper, gold, copper  
    | Spain      | rhodochrosite, copper, limestone, gold, limestone    
    +------------+------------------------------------------------------------------------

谢谢你的建议。

e37o9pze

e37o9pze1#

我将把商品栏分成单独的矿物,去掉重复的部分,然后按国家分组。希望这有帮助。谢谢。

select t.country, 
        collect_set(lower(t.minerals)) as minerals 
 from (select country, trim(minerals) as minerals 
       from depositOPT lateral view explode(split(commodity,',')) s as  minerals) t
group by t.country;

**Sample result:**

Country         Minerals
Argentina       ["copper","molybdenum","silver","lead","gold"]

相关问题