匹配并删除REGEX？中句子前的文本

camsedfj 于 2023-06-30 发布在其他

关注(0)|答案(3)|浏览(96)

如果我有以下句子结构：

string = '''welcome to our first meeting on talks to do with famous people, this time we are holding it on 1st January 2023 (see website details) 
 <<John Smith, Youtube>> 
I'm having a great day today 
<<Jane Doe, Google>> 
I'm going to the gym later 
<<Speaker>>
Time for people to speak 
<<Beff Jezos>> 
Buy something from my online shop. You might like it'''

我如何匹配并删除第一个<<之前的文本-所以这意味着结果字符串是：

result_string =  '''<<John Smith, Youtube>> 
I'm having a great day today 
<<Jane Doe, Google>> 
I'm going to the gym later 
<<Speaker>>
Time for people to speak 
<<Beff Jezos>> 
Buy something from my online shop. You might like it'''

我尝试了一个积极的前瞻与这个正则表达式：
(.*)(?<=(see website for details)
但是这会导致正则表达式引擎出现错误，因为它无法捕获<<之前的所有文本。
(see详细信息请访问网站）可能会随着时间的推移而改变，因此在<<之前的匹配更加稳健。
任何帮助感激不尽。
在前面的参考中，我使用了以下Python包：
import rere.sub(string, pattern, '')->空字符串作为替换，从而删除句子

regex

来源：https://stackoverflow.com/questions/76589079/match-and-remove-text-before-sentence-in-regex

3条答案

按热度按时间

8i9zcol21#

这个正则表达式应该可以工作：
re.search(r"(<<[\s\S]*)", string).group(1)
\s是一个空白字符，而\S是非空白字符，所以它可以无限次地匹配所有内容（因为*）

赞(0）回复(0）举报 2023-06-30

to94eoyn2#

您的模式缺少一个右括号：

pattern = r'(.*)(?<=(see website details))'

re.sub表达式中参数的顺序错误

re.sub(pattern, '', string)

工作几乎正常（在开始处有一个额外的'\n'，可以很容易地修复。
建议使用string.index（'<'）应该可以，因为它返回第一次出现的索引。

赞(0）回复(0）举报 2023-06-30

k7fdbhmy3#

不需要华丽的表达。您可以使用re.split简单地完成此操作。通过将分隔符放在捕获组中，它将在结果中保留分隔符，因此，您只需join它，而无需返回列表的第一个索引。

print("".join(re.split('(<<)', string)[1:]))

赞(0）回复(0）举报 2023-06-30

我来回答

匹配并删除REGEX？中句子前的文本

3条答案

相关问题

热门标签

最新问答