SparkSQL正则表达式提取函数java错误

tuwxkamq  于 2022-11-16  发布在  Apache
关注(0)|答案(1)|浏览(248)

我尝试从下面的表结构中提取以srsa开头的id

id      reason_text_field 
34394   {"initial_customer":"sda_WWyfr4AXY1fIAS", customer_result":"srsa_CAkAaAvNKL2OSD"}

以获得以下输出:

id      srsa_id 
34394   srsa_CAkAaAvNKL2OSD

但是当我使用下面的SparkSQL函数时

REGEXP_EXTRACT(reason_text_field, 'srsa[^"]*') as srsa_id

我得到这个错误:
异常错误:没有群组

kmbjn2e3

kmbjn2e31#

需要指定要捕获的组。请尝试以下操作:

SELECT  id, 
        REGEXP_EXTRACT(reason_text_field, '\"(srsa[^"]*)\"', 1) as srsa_id
        -- or REGEXP_EXTRACT(reason_text_field, 'srsa[^"]*', 0) as srsa_id
FROM    tb

但是请注意,您也可以使用from_json将文本列reason_text_field转换为map或struct,然后提取字段customer_result

SELECT  id, 
        from_json(reason_text_field, 'map<string,string>')['customer_result'] as srsa_id
FROM    tb

相关问题