如何在hive中最后一个单词出现后提取字符串

vpfxa7rd  于 2021-06-27  发布在  Hive
关注(0)|答案(2)|浏览(364)

我的一个配置单元表中有一个字符串列

select * from
(
select "edition_xx/news/radio_today_news_xx" as my_column
union all 
select "edition_xx/news/news/television_1.3" as my_column
) A

我想提取后面的字符串部分 news/ . 所以我的输出列看起来像

radio_today_news_xx
television_1.3

如何在hive中使用regex提取这个?请注意 news/ 可以出现x次,我希望字符串在最后一次出现之后。

km0tfn4u

km0tfn4u1#

使用 split() :

select  split(my_column,'(news/)+')[1] 
from
(
select "edition_xx/news/radio_today_news_xx" as my_column
union all 
select "edition_xx/news/news/television_1.3" as my_column
) A;

这个regexp意味着 news/ 一次或多次
结果:

radio_today_news_xx
television_1.3
Time taken: 37.218 seconds, Fetched: 2 row(s)
50few1ms

50few1ms2#

使用split获取最后一个匹配项

select split(A.my_column,'news\/')[size(split(A.my_column,'news\/'))-1] lt
    from
    (
    select "edition_xx/news/radio_today_news_xx" as my_column
    union all 
    select "edition_xx/news/news/television_1.3" as my_column
    union all
    select "edition_xx/news/radio_today/news_xx" as my_column
    )

输出

radio_today_news_xx
television_1.3
radio_today/news_xx

相关问题