regex 在R中提取字符串,其中包括字符串

ryoqjall  于 11个月前  发布在  其他
关注(0)|答案(2)|浏览(129)
ex02ChildrenInverse <- function(sentence) {
  
assertString(sentence)
  
matches <- regmatches(
    
    sentence,
    
    regexec('^(.*?) is the (father|mother) of "(.*?)"', sentence))[[1]]
  
parent <- matches[[2]]
  
male <- matches[[3]] == "father"
  
child <- matches[[4]]
  child <- gsub('".*"', '', matches[4])
  
return(list(parent = parent, male = male, child = child))
}

字符串
这里是我的代码。我的问题是,我想输出孩子的名字,即使它有双引号在他的名字。F.e:

input: 'Gudrun is the mother of "Rosamunde ("Rosi")".'


我的输出:

$parent

\[1\] "Gudrun"

$male

\[1\] FALSE

$child

\[1\] "Rosamunde ("

but i want

$parent

\[1\] "Gudrun"

$male

\[1\] FALSE

$child

\[1\] "Rosamunde ("Rosi")"


我试过我的代码,它没有像我想的那样工作。
我想换child \<- gsub(.......)

20jt8wwn

20jt8wwn1#

如果它总是字符串的最后一部分,你可以匹配最后一个双引号后面的点:

(.*?) is the (father|mother) of "(.*?)"\.

字符串
举例来说:

ex02ChildrenInverse <- function(sentence) {
  
  matches <- regmatches(
    sentence,
    regexec('(.*?) is the (father|mother) of "(.*?)"\\.', sentence))[[1]]
    
  parent <- matches[[2]]
  male   <- matches[[3]] == "father"
  child  <- matches[[4]]
  
  return(list(parent = parent, male = male, child = child))
}
ex02ChildrenInverse('Gudrun is the mother of "Rosamunde ("Rosi")".')


输出

$parent
[1] "Gudrun"

$male
[1] FALSE

$child
[1] "Rosamunde (\"Rosi\")"


参见R demoregex demo

ma8fv8wu

ma8fv8wu2#

一种新的代码方法是使用gsubgrepl来获取你想要的相关信息,而不是尝试使用regmatches来完成所有这些:

freshCode <- function(sentence) {
  parent <- gsub("(\\w+).*", "\\1", sentence)
  male <- grepl("father", sentence)
  child <- gsub("\\.", "", substring(sentence, regexpr('"', sentence) + 1))
  list(parent = parent, male = male, child = child)
}

freshCode('Gudrun is the mother of "Rosamunde ("Rosi")".')

# $parent
# [1] "Gudrun"
# 
# $male
# [1] FALSE
# 
# $child
# [1] "Rosamunde (\"Rosi\")\""

# Note the "\" in the above are not truly "visible": 
# > cat(freshCode('Gudrun is the mother of "Rosamunde ("Rosi")".')[[3]])
# Rosamunde ("Rosi")"

字符串
或者稍微修改一下你现有的代码:

ex02ChildrenInverse <- function(sentence) {
  matches <- regmatches(
    sentence,
    regexec('^(.*?) is the (father|mother) of "(.*?)"', sentence))[[1]]
  parent <- matches[[2]]
  male <- matches[[3]] == "father"
  child <- gsub("\\.", "", substring(sentence, regexpr('"', sentence) + 1))
  
  return(list(parent = parent, male = male, child = child))
}


它将返回与上面相同的输出。

相关问题