oracle sql regexp\u substr非捕获/可选组

7jmck4yq  于 2021-08-09  发布在  Java
关注(0)|答案(2)|浏览(497)

表达式:

Reassigning definition: (\d+) from: \[(\d+)\] to: \[(\d+)\].+?\.(?: Target definition = (\d+))?.*

正确生成以下匹配项:

Group 1.    24-30   494801
Group 2.    38-45   8280955
Group 3.    52-59   8336297
Group 4.    103-109 494767

对于输入字符串:

Reassigning definition: 494801 from: [8280955] to: [8336297], advancing due dates. Target definition = 494767.

输入字符串的前3个匹配项:

Reassigning definition: 494801 from: [8280955] to: [8336297], advancing due dates.

使用javascript、python、php和golang风格(参见https://regex101.com/r/br66wm/3),但不使用sql regexp substr:

with
  input_string as
  (
    select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates. Target definition = 494767.' as test_string from dual
    union all
    select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates.' as test_string from dual
   ),
   pattern_string as
   (
     select 'Reassigning definition: (\d+) from: \[(\d+)\] to: \[(\d+)\].+?\.(?: Target definition = (\d+))?.*$' as pattern_string from dual
   )
select
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 1) as group_1,
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 2) as group_2,
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 3) as group_3,
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 4) as group_4
from
  input_string i, pattern_string p;

第四组总是 null . 我使用非捕获组有什么问题?基本上,以下句子在我的输入测试字符串中是可选的:

Target definition = 494767.
c0vxltue

c0vxltue1#

这是一个有点太多的意见,所以我将写在这里。如果说不通,我就把它拿走。
如果您总是在这些字符串中寻找数字(与它们周围的内容无关),那么可以简化为

SQL> with
  2    input_string as
  3    (
  4      select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates. Target definition = 494767.' as test_string from dual
  5      union all
  6      select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates.' as test_string from dual
  7     )
  8  select regexp_substr(test_string, '\d+', 1, 1) grp1,
  9         regexp_substr(test_string, '\d+', 1, 2) grp2,
 10         regexp_substr(test_string, '\d+', 1, 3) grp3,
 11         regexp_substr(test_string, '\d+', 1, 4) grp4
 12  from input_string;

GRP1       GRP2       GRP3       GRP4
---------- ---------- ---------- ----------
494801     8280955    8336297    494767
494801     8280955    8336297

SQL>

或者,没有固定组数的选项(不过,布局与您想要的不同):

SQL> with
  2    input_string as
  3    (
  4      select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates. Target definition = 494767.' as test_string from dual
  5      union all
  6      select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates.' as test_string from dual
  7     )
  8  select column_value grp_rn,
  9         regexp_substr(test_string, '\d+', 1, column_value) grp
 10  from input_String cross join
 11    table(cast(multiset(select level from dual
 12                        connect by level <= regexp_count(test_string, '\d+')
 13                       ) as sys.odcinumberlist));

 GRP_RN GRP
------- ----------
      1 494801
      2 8280955
      3 8336297
      4 494767
      1 494801
      2 8280955
      3 8336297

7 rows selected.
bcs8qyzn

bcs8qyzn2#

因为基于posix的regex实现似乎不支持非捕获组和捕获的 regex_substr 不容易作为单独的列提供,我使用了以下内容,基本上对可选组使用不同的regex。

with
  input_string as
  (
    select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates. Target definition = 494767.' as test_string from dual
    union all
    select 'Reassigning definition: 494767 from: [8336297] to: [8369944], advancing dates.' as test_string from dual
   ),
   pattern_string as
   (
     select 'Reassigning definition: (\d+) from: \[(\d+)\] to: \[(\d+)\]' as pattern_string from dual
   )
select
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 1) as group_1,
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 2) as group_2,
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 3) as group_3,
  regexp_substr(i.test_string, 'Target definition = (\d+)', 1, 1, null, 1) as group_4
from
  input_string i, pattern_string p;

相关问题