Oracle SQL Developer Regex表达式

ct2axkht  于 2023-04-11  发布在  Oracle
关注(0)|答案(2)|浏览(123)

我不是正则表达式的Maven,寻求帮助。提前感谢
我想从描述列中提取一个格式化的子字符串。

my testing on 456897 - Carol M. Smith, Ph.D. 
my testing on 435670 - Ms. Paulina M. Hall
my testing on 980765 - Mr. John Smith
my testing on 14567 - Mrs. Lena C. Callum
my testing on 555777 - Dr. Paul F. Fairlake
234567 - Mr. Ryan M. Palmer, Sr.
123456 - Joyce R. Hilton, Ph.D.

TO

my testing on 456897 - C.Smith 
my testing on 435670 - Ms. P. Hall
my testing on 980765 - Mr. J. Smith
my testing on 14567 - Mrs. L. Callum
my testing on 555777 - Dr. P. Fairlake
234567 - Mr. R. Palmer
123456 - J. Hilton

我的查询适用于第一条和最后一条记录。但是,有标题的记录有点复杂。
对于有标题的记录,我需要保留标题、名字和姓氏的首字母。

SELECT description,
       CASE
           WHEN REGEXP_LIKE(description, '(Mr\.|Ms\.|Mrs\.|Dr\.)') THEN REGEXP_REPLACE(description, '(Ms\.|Mr\.|Mrs\.|Dr\.[A-Z][a-z]+ [A-Z]\.)')
           WHEN NOT REGEXP_LIKE(description, '(Mr\.|Ms\.|Mrs\.|Dr\.)')  THEN REGEXP_REPLACE(description, '(\w)\w*\W+(\w)\w*\W+(\w+),.*', '\1. \3')
           ELSE 'some other validation needed'
       END AS order_regex
       from mytable;

再次感谢您的任何建议。

dphi5xsq

dphi5xsq1#

我会这样做:

select
  t1.*
 ,regexp_replace(
     t1.description
    ,'([^-]+)-\s*((Mr|Ms|Mrs|Dr)[.]\s*)?(\w)\w*(\s[a-zA-Z.]*)*\s(\w+)(,.*|$)'
    ,'\1- \2\4. \6'
    ) subs
from t1

此正则表达式的简短说明:

  1. ([^-]+)--查找以-结尾的子字符串的第一部分(子表达式#1)
  2. \s*-任意数量的空格字符
  3. ((Mr|Ms|Mrs|Dr)[.]\s*)?-检查Mr.|女士|夫人|Dr.存在并作为子表达式#2返回
  4. (\w)\w*-查找一个名称并返回第一个字母作为子表达式$3
  5. (\s[a-zA-Z.]*)*-名字和姓氏之间的任意数量的单词(子表达式#4)
  6. \s(\w+)(,.*|$)-查找姓氏(即','之前的最后一个单词或字符串的结尾)并作为子表达式#5返回。
    完整测试用例:
with t1 as (
select 'my testing on 456897 - Carol M. Smith, Ph.D. ' description from dual union all
select 'my testing on 435670 - Ms. Paulina M. Hall' from dual union all
select 'my testing on 980765 - Mr. John Smith' from dual union all
select 'my testing on 14567 - Mrs. Lena C. Callum' from dual union all
select 'my testing on 555777 - Dr. Paul F. Fairlake' from dual union all
select '234567 - Mr. Ryan M. Palmer, Sr.' from dual union all
select '123456 - Joyce R. Hilton, Ph.D.' from dual  
)
select
  t1.*
 ,regexp_replace(
     t1.description
    ,'([^-]+)-\s*((Mr|Ms|Mrs|Dr)[.]\s*)?(\w)\w*(\s[a-zA-Z.]*)*\s(\w+)(,.*|$)'
    ,'\1- \2\4. \6'
    ) subs
from t1;

DBFiddle:https://dbfiddle.uk/HNHHzGR4

p8h8hvxi

p8h8hvxi2#

对于这个确切的例子,你可以使用这样的东西:

select
  description,
       CASE
           WHEN REGEXP_LIKE(description, '(Mr\.|Ms\.|Mrs\.|Dr\.)') THEN 
              REGEXP_REPLACE(description,  '(Ms\.|Mr\.|Mrs\.|Dr\.) ([A-Z])[a-zA-Z. ]+ ([A-Za-z]+)', '\1 \2. \3')
           WHEN REGEXP_LIKE(description, ', (Ph\.D\.|Sr\.)')  THEN 
              REGEXP_REPLACE(description, '([A-Z])[a-z]+ ([A-Z]\.)? ([A-Z][a-z]+), (Ph\.D\.|Sr\.)', '\1. \3')
           ELSE 'some other validation needed'
       END AS order_regex
from t1

编辑:对多部分名称更通用一些:

select
  description,
       CASE
           WHEN REGEXP_LIKE(description, '(Mr\.|Ms\.|Mrs\.|Dr\.)') THEN 
              REGEXP_REPLACE(description,  '(Ms\.|Mr\.|Mrs\.|Dr\.) ([A-Z])[a-zA-Z. ]+ ([A-Za-z]+)', '\1 \2. \3')
           WHEN REGEXP_LIKE(description, ', (Ph\.D\.|Sr\.)')  THEN 
              REGEXP_REPLACE(description, '([A-Z])[a-zA-Z. ]* ([A-Z][a-z]+), (Ph\.D\.|Sr\.)', '\1. \2')
           ELSE 'some other validation needed'
       END AS order_regex
from t1

演示herev2
但是一般来说,名称很难解析,我担心简单的正则表达式集不起作用。

相关问题