regex 按顺序匹配正则表达式中的所有字符,直到特定序列或字符

hgc7kmma  于 2022-11-18  发布在  其他
关注(0)|答案(2)|浏览(133)

我要匹配此文本:

<SERIES>
<OWNER-CIK>0000003521
<SERIES-ID>S000020958
<SERIES-NAME>Alger Small Cap Focus Fund
<CLASS-CONTRACT>
<CLASS-CONTRACT-ID>C000059340
<CLASS-CONTRACT-NAME>Alger Small Cap Focus Fund Class I
<CLASS-CONTRACT-TICKER-SYMBOL>AOFIX
</CLASS-CONTRACT>
<CLASS-CONTRACT>
<CLASS-CONTRACT-ID>C000095961
<CLASS-CONTRACT-NAME>Alger Small Cap Focus Fund Class Z
<CLASS-CONTRACT-TICKER-SYMBOL>AGOZX
</CLASS-CONTRACT>
<CLASS-CONTRACT>
<CLASS-CONTRACT-ID>C000179520
<CLASS-CONTRACT-NAME>Class Y
<CLASS-CONTRACT-TICKER-SYMBOL>AOFYX
</CLASS-CONTRACT>
</SERIES>
<SERIES>

发件人:

<SERIES>

直到

</SERIES>

我正在尝试:

<SERIES>[^/]+

但它在以下行失败:

</CLASS-CONTRACT>

如果我把S加到finish中的正则表达式中甚至更早,因为它以任何字符/或S结尾。我需要两者都以特定的顺序出现/S

pbpqsu0x

pbpqsu0x1#

这应该能用。它使用了一个前瞻功能,所以它知道什么时候该停止。

import re

pattern = re.compile(r'<SERIES>.*(?=\n<SERIES&)',re.S)
print(pattern.findall(text)[0])

输出。

<SERIES>
<OWNER-CIK>0000003521
<SERIES-ID>S000020958
<SERIES-NAME>Alger Small Cap Focus Fund
<CLASS-CONTRACT>
<CLASS-CONTRACT-ID>C000059340
<CLASS-CONTRACT-NAME>Alger Small Cap Focus Fund Class I
<CLASS-CONTRACT-TICKER-SYMBOL>AOFIX
</CLASS-CONTRACT>
<CLASS-CONTRACT>
<CLASS-CONTRACT-ID>C000095961
<CLASS-CONTRACT-NAME>Alger Small Cap Focus Fund Class Z
<CLASS-CONTRACT-TICKER-SYMBOL>AGOZX
</CLASS-CONTRACT>
<CLASS-CONTRACT>
<CLASS-CONTRACT-ID>C000179520
<CLASS-CONTRACT-NAME>Class Y
<CLASS-CONTRACT-TICKER-SYMBOL>AOFYX
</CLASS-CONTRACT>
</SERIES>
dvtswwa3

dvtswwa32#

只需在结束锚点之间使用.*?。您将需要re.S,以便.匹配换行符。?使其成为最短匹配,以防结束锚点多次出现。
所以完整的字符串应该是

r"<SERIES>.*?</SERIES>"

相关问题