R语言 gsub只捕获组的最后一个字符

nsc4cvqm  于 2023-07-31  发布在  其他
关注(0)|答案(3)|浏览(96)

我有这个字符向量

vec <- c("(0,13.2]", "(13.2,28.3]", "(28.3,39.3]", "(39.3,49.4]", "(49.4,59.4]",
     "(59.4,69.3]", "(69.3,78.9]", "(78.9,87.8]", "(87.8,95.5]", "(95.5,100]")

字符串
我想把条目改成

expected <- c("0 to 13.2",  "13.2 to 28.3",  "28.3 to 39.3",  "39.3 to 49.4",  "49.4 to 59.4", 
     "59.4 to 69.3",  "69.3 to 78.9",  "78.9 to 87.8",  "87.8 to 95.5",  "95.5 to 100")


我做的是

vec %>%
  strsplit(., ",") %>%
  lapply(., function(level_i){
    from <- gsub("^\\(([0-9])+(\\.)*([0-9])*$", "\\1\\2\\3", level_i[1])
    to <- gsub("^([0-9])+(\\.)*([0-9])*]$", "\\1\\2\\3", level_i[2])
    paste0(from, " to ", to)
  }) %>%
  unlist()
# This gives:
# "0 to 3.2" "3.2 to 8.3" "8.3 to 9.3" "9.3 to 9.4" "9.4 to 9.4" "9.4 to 9.3" "9.3 to 8.9"
# "8.9 to 7.8" "7.8 to 5.5" "5.5 to 0"


我的代码只捕获组的最后一个元素,即。"(0,13.2]"变成"0 to 3.2"而不是"0 to 13.2"。如何捕获一个组的所有字符?

wkftcu5l

wkftcu5l1#

使用gsub,您可以使用()捕获组:

gsub('\\((.*),(.*)\\]', "\\1 to \\2", vec)
#[1] "0 to 13.2"    "13.2 to 28.3" "28.3 to 39.3" "39.3 to 49.4" "49.4 to 59.4"
#[6] "59.4 to 69.3" "69.3 to 78.9" "78.9 to 87.8" "87.8 to 95.5" "95.5 to 100"

字符串
为了准确地捕获数字而不是.*,您可以这样做。这包括整数和十进制格式:

gsub('\\((\\d+[\\.]*\\d*),(\\d+[\\.]*\\d*)\\]', "\\1 to \\2", vec)


有了所有这些反作用,你可以用raw strings简化正则表达式:r"{\((\d+[\.]*\d*),(\d+[\.]*\d*)\]}"

smdnsysy

smdnsysy2#

你可以试试read.table + trimws

do.call(paste, c(
    read.table(text = trimws(vec, whitespace = "[\\(\\]]"), sep = ","),
    sep = " to "
))

字符串
这给了

[1] "0 to 13.2"    "13.2 to 28.3" "28.3 to 39.3" "39.3 to 49.4" "49.4 to 59.4"
 [6] "59.4 to 69.3" "69.3 to 78.9" "78.9 to 87.8" "87.8 to 95.5" "95.5 to 100"


另一个技巧是sub + trimws + chartr

> sub(",", " to ", trimws(chartr("(]", "  ", vec)))
 [1] "0 to 13.2"    "13.2 to 28.3" "28.3 to 39.3" "39.3 to 49.4" "49.4 to 59.4"
 [6] "59.4 to 69.3" "69.3 to 78.9" "78.9 to 87.8" "87.8 to 95.5" "95.5 to 100"

zbdgwd5y

zbdgwd5y3#

我们可以使用gsub两次:

gsub("[^a-zA-Z0-9. ]", "", gsub(",", " to ", vec))
[1] "0 to 13.2"    "13.2 to 28.3" "28.3 to 39.3" "39.3 to 49.4" "49.4 to 59.4"
 [6] "59.4 to 69.3" "69.3 to 78.9" "78.9 to 87.8" "87.8 to 95.5" "95.5 to 100"

相关问题