regex 从字符串中提取两个单词之间的子字符串

db2dz4w8 于 2023-05-01 发布在其他

关注(0)|答案(4)|浏览(214)

我有以下字符串：

string = "asflkjsdhlkjsdhglk<body>Iwant\to+extr@ctth!sstr|ng<body>sdgdfsghsghsgh"

我想提取两个<body>标记之间的字符串。我想要的结果是：

substring = "<body>Iwant\to+extr@ctth!sstr|ng<body>"

请注意，两个<body>标记之间的子字符串可以包含字母、数字、标点符号和特殊字符。
有什么简单的方法吗？

regex

来源：https://stackoverflow.com/questions/20224591/extract-a-substring-between-two-words-from-a-string

4条答案

按热度按时间

uubf1zoe1#

下面是正则表达式的方法：

regmatches(string, regexpr('<body>.+<body>', string))

赞(0）回复(0）举报 2023-05-01

qni6mghb2#

regex = '<body>.+?<body>'

您需要非贪婪（.+?），这样它就不会将尽可能多的<body>标记分组。
如果你只使用一个没有辅助函数的正则表达式，你将需要一个捕获组来提取所需的内容，即：

regex = '(<body>.+?<body>)'

赞(0）回复(0）举报 2023-05-01

hjzp0vay3#

strsplit（）可以帮助你：

>string = "asflkjsdhlkjsdhglk<body>Iwant\to+extr@ctth!sstr|ng<body>sdgdfsghsghsgh"
>x = strsplit(string, '<body>', fixed = FALSE, perl = FALSE, useBytes = FALSE)
[[1]]
[1] "asflkjsdhlkjsdhglk"         "Iwant\to+extr@ctth!sstr|ng" "sdgdfsghsghsgh"  
> x[[1]][2]
[1] "Iwant\to+extr@ctth!sstr|ng"

当然，这给了您字符串的所有三个部分，并且不包括标记。

赞(0）回复(0）举报 2023-05-01

kupeojn64#

我相信马修和史蒂夫的回答都是可以接受的。下面是另一个解决方案：

string = "asflkjsdhlkjsdhglk<body>Iwant\to+extr@ctth!sstr|ng<body>sdgdfsghsghsgh"

regmatches(string, regexpr('<body>.+<body>', string))

output = sub(".*(<body>.+<body>).*", "\\1", string)

print (output)

赞(0）回复(0）举报 2023-05-01

我来回答

regex 从字符串中提取两个单词之间的子字符串

4条答案

相关问题

热门标签

最新问答