oracle 限制特定文本的特定长度

tjvv9vkg  于 2023-05-28  发布在  Oracle
关注(0)|答案(5)|浏览(370)

假设我有一个长文本字段。
| 色谱柱A| B柱|
| - -----|- -----|
| S101|句子1:从前有一个穷孩子,他整天挨家挨户地卖报纸来交学费。有一天,当他走在路上时,他开始感到虚弱无力。这个可怜的男孩饿坏了,所以他决定来到隔壁时要点吃的。句子2:这个可怜的男孩要吃的,但每次都被拒绝,直到他走到一个女孩的门口。他要了一杯水,但看到他可怜的状态,女孩回来了一杯牛奶。男孩问他欠她多少牛奶钱,但她拒绝付款。几年后,这个已经长大成人的女孩病倒了。她找了一个又一个医生,但没有人能治好她的病。最后,她去找了城里最好的医生。医生花了几个月的时间治疗她,直到她终于痊愈。尽管她很高兴,但她担心她付不起账单。但是,当医院把账单递给她时,上面写着:“全额支付,一杯牛奶。”|
每次显示单词“sentence”时,我们会将单词“sentence”的数据修剪为50个字符。
输出:
| 色谱柱A| B柱|
| - -----|- -----|
| S101|句子1:从前有一个穷孩子,他花了他的钱。可怜的男孩要求食物,但被de句子3:多年以后,那个女孩,现在已经是一个g了医生花了几个月的时间为她治疗.|
我尝试子字符串和字符串的组合,但没有工作

wpcxdonn

wpcxdonn1#

一个选项可能是将 * 长 * 文本拆分为行(每个句子一行),从每行中提取50个字符并将它们聚合回来。
样本数据:

SQL> with test (col) as
  2  (select 'Sentence 1: There once was a poor boy who spent his days going door-to-door selling
  3  newspapers to pay for school. One day, as he was walking his route, he started feeling low and weak.
  4  The poor boy was starving, so he decided to ask for food when he came to the next door.
  5  Sentence 2: The poor boy asked for food but was denied every time, until he reached the door of a girl.
  6  He asked for a glass of water,but seeing his poor state, the girl came back with a glass of milk.
  7  The boy asked how much he owed her for the milk, but she refused payment.
  8  Sentence 3: Years later, the girl, who was now a grown woman, fell sick. She went from doctor
  9  to doctor, but no one was able to cure her.Finally, she went to the best doctor in town.
 10  Sentence 4: The doctor spent months treating her until she was finally cured. Despite her
 11  happiness, she was afraid she couldn’t afford to pay the bill. But, when the hospital
 12  handed her the bill, it read, ‘Paid in full, with a glass of milk.' from dual),

查询从这里开始:

13  temp as
 14    (select level lvl,
 15            regexp_substr(replace(col, 'Sentence', '#'), '[^#]+', 1, level) val
 16     From test
 17     connect by level <= regexp_count(col, 'Sentence')
 18    )
 19  select listagg(substr('Sentence ' || val, 1, 51), ' ') within group (order by lvl) result
 20  from temp;

RESULT
--------------------------------------------------------------------------------
Sentence  1: There once was a poor boy who spent hi Sentence  2: The poor boy as
ked for food but was de Sentence  3: Years later, the girl, who was now a g Sent
ence  4: The doctor spent months treating her u

SQL>
ajsxfq5m

ajsxfq5m2#

我认为使用regexp而不使用lookahead或lookbehind(因为不支持):

select
  REGEXP_REPLACE(
    'Sentence 1: There once was a poor boy who spent his days going door-to-door selling newspapers to pay for school. One day, as he was walking his route, he started feeling low and weak.The poor boy was starving, so he decided to ask for food when he came to the next door. Sentence 2: The poor boy asked for food but was denied every time, until he reached the door of a girl. He asked for a glass of water,but seeing his poor state, the girl came back with a glass of milk. The boy asked how much he owed her for the milk, but she refused payment.Sentence 3: Years later, the girl, who was now a grown woman, fell sick. She went from doctor to doctor, but no one was able to cure her.Finally, she went to the best doctor in town.Sentence 4: The doctor spent months treating her until she was finally cured. Despite her happiness, she was afraid she couldn’t afford to pay the bill. But, when the hospital handed her the bill, it read, ‘Paid in full, with a glass of milk.’',
    '(: )(.{1,38})(.*?)(Sentence\s+\d+|$)',
    '\1\2 \4'
  ) from dual

我们的想法是,我们知道它以Sentence开头,后跟一个数字,然后是:,所以我们可以假设我们首先从搜索:开始,然后捕获下一个Sentense #之前的内容(其中#是一些数字)。
我们捕获第二组中的文本,以防它比预期的要短。然后我们捕获要删除的文本,然后匹配下一个句子分隔符。
你可以在这里测试:https://dbfiddle.uk/kGQMyNx _

du7egjpx

du7egjpx3#

@Littlefoot的答案大部分都有效,但依赖于用'#'替换'Sentence'来使正则表达式'[^#]+'工作,这意味着当输入包含'#'本身时,它会失败。
一种更稳健的方法是使用instr函数来查找'Sentence'在输入中出现的位置:

with sentences as (
    select
        'Sentence 1: There once was a poor boy who spent his days going door-to-door selling newspapers to pay for school. One day, as he was walking his route, he started feeling low and weak.The poor boy was starving, so he decided to ask for food when he came to the next door. Sentence 2: The poor boy asked for food but was denied every time, until he reached the door of a girl. He asked for a glass of water,but seeing his poor state, the girl came back with a glass of milk. The boy asked how much he owed her for the milk, but she refused payment.Sentence 3: Years later, the girl, who was now a grown woman, fell sick. She went from doctor to doctor, but no one was able to cure her.Finally, she went to the best doctor in town.Sentence 4: The doctor spent months treating her until she was finally cured. Despite her happiness, she was afraid she couldn’t afford to pay the bill. But, when the hospital handed her the bill, it read, ‘Paid in full, with a glass of milk.’'
        as s from dual
)
select listagg(sentence, ' ') within group (order by lvl)
from (
    select substr(s, instr(s, 'Sentence', 1, level), 50) sentence, level lvl
    from sentences
    connect by level <= regexp_count(s, 'Sentence')
);

该输出:

Sentence 1: There once was a poor boy who spent hi Sentence 2: The poor boy asked for food but was de Sentence 3: Years later, the girl, who was now a g Sentence 4: The doctor spent months treating her u

演示:https://livesql.oracle.com/apex/livesql/s/o8l4vu3rzs67vs8ag2ldmzs6k

z4bn682m

z4bn682m4#

您可以使用递归查询来查找以Sentence \d+:开头的每个子字符串,并递归地连接这些子字符串的最多50个字符:

WITH data (rn, column_a, column_b, epos, updated_column_b) AS (
  SELECT ROWNUM,
         column_a,
         column_b,
         1,
         EMPTY_CLOB()
  FROM   table_name
UNION ALL
  SELECT rn,
         column_a,
         column_b,
         REGEXP_INSTR(column_b, 'Sentence \d+:', epos, 2),
         updated_column_b
         || ' '
         || SUBSTR(
              column_b,
              epos,
              CASE REGEXP_INSTR(column_b, 'Sentence \d+:', epos, 2)
              WHEN 0
              THEN 50
              ELSE LEAST(
                     50,
                     REGEXP_INSTR(column_b, 'Sentence \d+:', epos, 2) - epos
                   )
              END
            )
  FROM   data
  WHERE  epos > 0
)
SEARCH DEPTH FIRST BY rn SET order_id
SELECT column_a,
       SUBSTR(updated_column_b, 2) AS updated_column_b
FROM   data
WHERE  epos = 0;

其中,对于样本数据:

CREATE TABLE table_name (Column_A, Column_B) AS
SELECT 'S101',
       EMPTY_CLOB() || 'Sentence 1: There once was a poor boy who spent his days going door-to-door selling newspapers to pay for school. One day, as he was walking his route, he started feeling low and weak.The poor boy was starving, so he decided to ask for food when he came to the next door. Sentence 2: The poor boy asked for food but was denied every time, until he reached the door of a girl. He asked for a glass of water,but seeing his poor state, the girl came back with a glass of milk. The boy asked how much he owed her for the milk, but she refused payment.Sentence 3: Years later, the girl, who was now a grown woman, fell sick. She went from doctor to doctor, but no one was able to cure her.Finally, she went to the best doctor in town.Sentence 4: The doctor spent months treating her until she was finally cured. Despite her happiness, she was afraid she couldn’t afford to pay the bill. But, when the hospital handed her the bill, it read, ‘Paid in full, with a glass of milk.’'
FROM   DUAL

其输出:
| 列_A|更新列B|
| - -----|- -----|
| S101|句子1:从前有一个穷孩子,他花了他的钱。可怜的男孩要求食物,但被de句子3:多年以后,那个女孩,现在已经是一个g了医生花了几个月的时间为她治疗.|
fiddle

kgqe7b3p

kgqe7b3p5#

下面是一个正则表达式,它捕获每个“最多50个字符的句子”:

Sentence.{50}

我希望这会有所帮助。

相关问题