使用RegEx和Python的模式匹配标记(re.findall)

9gm1akwq 于 2022-11-18 发布在 Python

关注(0)|答案(1)|浏览(163)

我需要匹配并捕获标记对之间的信息。每行有两对标记。一对标记如下所示：

<a> </a> <b>hello hello 123</b> stuff to ignore here <i>123412bhje</i> <a>what???</a> stuff to ignore here <b>asd13asf</b> <i>who! Hooooo!</i> stuff to ignore here <i>df7887a</i>

预期输出为：

hello hello 123 123412bhje 
what??? asd13asf 
who! Hooooo! df7887a

我需要特别使用以下格式：

M = re.findall(“”, linein)

regex

来源：https://stackoverflow.com/questions/74249269/pattern-matching-tags-with-regex-and-python-re-findall

1条答案

按热度按时间

chhkpiq41#

为了忽略第一个<a> </a>标记，正则表达式必须假设标记内的第一个字符不包含空格，但此后允许使用空格。

以下是其他假设：

标签字母为小写。例如<b> </b> <i> </i>
标记对之间的信息只能包含uppercase letters、lowercase letters、numbers和符号! and ?。如果标记内有其他符号，则可能无法准确匹配。
以下是基于您的示例的工作版本：

import re

linein = '<a> </a> <b>hello hello 123</b> stuff to ignore here <i>123412bhje</i> <a>what???</a> stuff to ignore here <b>asd13asf</b> <i>who! Hooooo!</i> stuff to ignore here <i>df7887a</i>'
M = re.findall(r'<[a-z]+>([A-Za-z0-9?!][[A-Za-z0-9?!\s]*)</[a-z]>', linein)

for i in range(0,len(M),2):
    print(M[i],M[i+1])

输出：

hello hello 123 123412bhje
what??? asd13asf
who! Hooooo! df7887a

赞(0）回复(0）举报 2022-11-18

我来回答

使用RegEx和Python的模式匹配标记(re.findall)

1条答案

相关问题

热门标签

最新问答