如何减去配置单元中的项

ibrsph3r  于 2021-06-26  发布在  Hive
关注(0)|答案(4)|浏览(185)

我为每个客户提供了两个项目列表,itemlista和itemlistb

Customer_id   ItemListA   ItemListB
   24            2,3         3,4,5
   26            6,7         8,9,10
   25            4,5          5,8

我希望itemlistb中的项不在itemlista中,因此输出应该是:

Customer_id   ItemListB_A
     24          4,5
     26          8,9,10
     25          4,8

我不知道如何从Hive中的两个字符串中减去项目。我知道collect\u set,但它可以删除重复项,但不能删除项目交集。

niwlg2el

niwlg2el1#

select      customer_id

           ,split
            (
                regexp_replace
                (
                    concat(ItemListB,',:,',ItemListA)
                   ,'(?<=^|,)(?<item>.*?),(?=.*(?<=,)\\k<item>(?=,|$))'
                   ,''
                )
               ,',?:'
            )[0]        as ItemListB_A

from        mytable
+-------------+-------------+
| customer_id | itemlistb_a |
+-------------+-------------+
|          24 | 4,5         |
|          26 | 8,9,10      |
|          25 | 8           |
+-------------+-------------+
1rhkuytd

1rhkuytd2#

这将得到预期的结果。

select Customer_id, collect_list(y) from
    (select Customer_id, y , count(*) as cnt from 
    (select Customer_id,y from (select Customer_id, split(concat_ws(',',ItemListA,ItemListB),',') as x from table_name) temp lateral view explode(x) temp as y) temp1 
    group by Customer_id,y  ) temp2 where cnt =1 group by Customer_id;
lmvvr0a8

lmvvr0a83#

select a.Customer_id,concat_ws (',',collect_list (a.item)) as ItemListB_A from (            
select      t.Customer_id
                       ,pe.item

            from        test_hive     t
                        lateral view posexplode (split (concat_ws(',',ItemListB),',')) pe as pos,item
                        lateral view posexplode (split (concat_ws(',',ItemListA),',')) pe1 as pos1,item1
            group by t.Customer_id,pe.item
            having count(case when item=item1 then 1 end)=0
)a
group by a.Customer_id
hrysbysz

hrysbysz4#

select      customer_id
           ,concat_ws (',',collect_list (item)) as ItemListB_A

from       (select      t.customer_id
                       ,pe.item

            from        mytable     t
                        lateral view posexplode (split (concat_ws(',',ItemListA,ItemListB),',')) pe as pos,item

            group by    t.customer_id
                       ,pe.item

            having      count (case when pos < size(split(ItemListA,',')) then 1 end) = 0  
            ) t

group by    customer_id
+-------------+-------------+
| customer_id | itemlistb_a |
+-------------+-------------+
|          24 | 4,5         |
|          25 | 8           |
|          26 | 10,8,9      |
+-------------+-------------+

相关问题