Oracle查询,列出CLOB字段中出现的所有字符串及其后续值

bnl4lu3b  于 2023-10-16  发布在  Oracle
关注(0)|答案(1)|浏览(102)

我在Oracle表中有一个CLOB字段,希望提取所有匹配模式“RES_GetResData_Public_ScreenPrint”的字符串及其后续值(由"|’)。

样本数据-

|-1080|0833|RES_GetResData_Public_ScreenPrint010|F28028079|0820|3.3 02/17/14080|080|080|031|00879|[0-0]?[3.3 02/17/14-3.3 02/17/14]?[0833]|RES_GetResData_Public_ScreenPrint011|F28028081|080|080|080|080|032|-1080|032|-1032|-1080|032|-1080|032|-1080|0833|RES_GetResData_Public_ScreenPrint013|F28028007|0820|

对于上面提供的示例,我使用下面的查询来获取字符串“RES_GetResData_Public_ScreenPrint”及其后续值(用|)的CLOB数据。
该查询能够获取所有出现的“*RES_GetResData_Public_ScreenPrint”,但仅返回所有行中的第一个后续值。

查询-

select objectid, 
    REGEXP_SUBSTR(DATA, 'RES_GetResData_Public_ScreenPrint[^\|]+', 1,column_value) column1, 
    REGEXP_SUBSTR(DATA, '([^\|]+)', 1, 3) AS column2
from test_table cross join table(cast(multiset(select level 
from dual connect by level <= regexp_count(DATA, 'RES_GetResData_Public_ScreenPrint') ) as sys.odcinumberlist))

实际结果-

| objectID| column1| column2|
| --|--|--|
| 12345 |RES_GetResData_Public_ScreenPrint010| F28028079|
| 12345 |RES_GetResData_Public_ScreenPrint011| F28028079|
| 12345 |RES_GetResData_Public_ScreenPrint013| F28028079|

预期结果-

| objectID| column1| column2|
| --|--|--|
| 12345 |RES_GetResData_Public_ScreenPrint010| F28028079|
| 12345 |RES_GetResData_Public_ScreenPrint011| F28028081|
| 12345 |RES_GetResData_Public_ScreenPrint013| F28028007|
我是新来写查询和任何建议将是有益的。
感谢您的评分!

8cdiaqws

8cdiaqws1#

你不需要正则表达式;虽然它是更多的类型,你可能会发现,简单的字符串函数比正则表达式更快:

WITH line_bounds (objectid, data, spos, epos) AS (
  SELECT objectid,
         data,
         1,
         INSTR(data, CHR(10), 1)
  FROM   test_table
UNION ALL
  SELECT objectid,
         data,
         epos + 1,
         INSTR(data, CHR(10), epos + 1)
  FROM   line_bounds
  WHERE  epos > 0
)
SEARCH DEPTH FIRST BY objectid SET orderid,
lines (objectid, line) AS (
  SELECT objectid,
         CASE epos
         WHEN 0
         THEN SUBSTR(data, spos)
         ELSE SUBSTR(data, spos, epos -spos)
         END
  FROM   line_bounds
),
match_bounds (objectid, line, res_spos, res_epos, next_epos) AS (
  SELECT objectid,
         line,
         INSTR(line, '|RES_GetResData_Public_ScreenPrint'),
         INSTR(line, '|', INSTR(line, '|RES_GetResData_Public_ScreenPrint') + 1, 1),
         INSTR(line, '|', INSTR(line, '|RES_GetResData_Public_ScreenPrint') + 1, 2)
  FROM   lines
)
SELECT objectid,
       SUBSTR(line, res_spos + 1, res_epos - res_spos - 1) AS column1,
       SUBSTR(line, res_epos + 1, next_epos - res_epos - 1) AS column2
FROM   match_bounds
WHERE  res_spos > 0;

其中,对于样本数据:

CREATE TABLE test_table (objectid, data) AS
SELECT 12345,
       EMPTY_CLOB() || '|-1080|0833|RES_GetResData_Public_ScreenPrint010|F28028079|0820|3.3 02/17/14080|080|080|031|
|00879|[0-0]?[3.3 02/17/14-3.3 02/17/14]?[0833]|RES_GetResData_Public_ScreenPrint011|F28028081|080|080|080|080|032|-1080|032|-1032|-1080|032|-1080|032|
|-1080|0833|RES_GetResData_Public_ScreenPrint013|F28028007|0820|'
FROM   DUAL;

输出:
| objectID| COLUMN1| COLUMN2|
| --|--|--|
| 12345 |RES_GetResData_Public_ScreenPrint010| F28028079|
| 12345 |RES_GetResData_Public_ScreenPrint011| F28028081|
| 12345 |RES_GetResData_Public_ScreenPrint013| F28028007|
如果你确实想使用正则表达式(请在你的数据上比较两种解决方案的性能),那么你可以匹配整个模式并提取捕获组的值:

select objectid, 
       REGEXP_SUBSTR(
         DATA,
         '\|(RES_GetResData_Public_ScreenPrint.*?)\|(.*?)\|',
         1,
         column_value,
         NULL,
         1
       ) AS column1, 
       REGEXP_SUBSTR(
         DATA,
         '\|(RES_GetResData_Public_ScreenPrint.*?)\|(.*?)\|',
         1,
         column_value,
         NULL,
         2
       ) AS column2
from   test_table
       cross join table(
         cast(
           multiset(
             select level 
             from   dual
             connect by level <= regexp_count(DATA, '\|(RES_GetResData_Public_ScreenPrint.*?)\|(.*?)\|')
           ) as sys.odcinumberlist
         )
       )

它具有相同的输出。
fiddle

相关问题