如何在分解后从数组中删除一个元素

rmbxnbpk  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(651)

我有一个列id为\u list的表,其中有一组字符串,例如。 ["1","2","4","6"] .
原来的table是

+-------------------+
|   id_list         |
+-------------------+
| ["1","2","4","6"] |
|-------------------+
| ["1","2","4","6"] |
|-------------------+
| ["1","2","4","6"] |
|-------------------+
| ["1","2","4","6"] |
|-------------------+
| ["7"]             |
|-------------------+

我通过分解id\u list列创建了下表:

SELECT id, id_list FROM data LATERAL VIEW explode(id_list) dummy AS id;
+-------------+-------------------+
|     id      |   id_list         |
+-------------+-------------------+
|      1      | ["1","2","4","6"] |
+-------------|-------------------+
|      2      | ["1","2","4","6"] |
+-------------|-------------------+
|      4      | ["1","2","4","6"] |
+-------------|-------------------+
|      6      | ["1","2","4","6"] |
+-------------|-------------------+
|      7      | ["7"]             |
+-------------|-------------------+

我想从id\u列表中删除该id,得到如下表:

+-------------+-------------------+
|     id      |   id_list         |
+-------------+-------------------+
|      1      | ["2","4","6"]     |
+-------------|-------------------+
|      2      | ["1","4","6"]     |
+-------------|-------------------+
|      4      | ["1","2","6"]     |
+-------------|-------------------+
|      6      | ["1","2","4"]     |
+-------------|-------------------+
|      7      | []                |
+-------------|-------------------+

不使用自定义项怎么做?

a0zr77ik

a0zr77ik1#

您只能分解和收集那些id不相等的元素。
演示:

with initial_data as (
select 1 id   ,array("1","2","4","6") list union all
select 2   ,array("1","2","4","6") list union all    
select 3   ,array("1","2","4","6")   list union all 
select 4  ,array("1","2","4","6")   list union all 
select 6   ,array("1","2","4","6")   list union all
select 7   ,array("7")  
)    

SELECT d.id, collect_list(case when e.id!= d.id then e.id end) id_list
  FROM initial_data d 
       LATERAL VIEW explode(list) e AS id
 GROUP BY d.id;

结果:

OK
id      id_list
1       ["2","4","6"]
2       ["1","4","6"]
3       ["1","2","4","6"]
4       ["1","2","6"]
6       ["1","2","4"]
7       []
Time taken: 38.645 seconds, Fetched: 6 row(s)

相关问题