regex 如何将字符串向量转换为标题大小写

mrphzbgm 于 2023-05-19 发布在其他

关注(0)|答案(6)|浏览(153)

我有一个小写字符串向量。我想把它们改为标题大小写，意思是每个单词的第一个字母都要大写。我已经成功地用一个双循环来完成它，但我希望有一种更高效、更优雅的方法来完成它，也许是一个带有gsub和正则表达式的一行程序。
这里有一些示例数据，沿着工作的双循环，然后是我尝试过的其他不起作用的东西。

strings = c("first phrase", "another phrase to convert",
            "and here's another one", "last-one")

# For each string in the strings vector, find the position of each 
#  instance of a space followed by a letter
matches = gregexpr("\\b[a-z]+", strings) 

# For each string in the strings vector, convert the first letter 
#  of each word to upper case
for (i in 1:length(strings)) {

  # Extract the position of each regex match for the string in row i
  #  of the strings vector.
  match.positions = matches[[i]][1:length(matches[[i]])] 

  # Convert the letter in each match position to upper case
  for (j in 1:length(match.positions)) {

    substr(strings[i], match.positions[j], match.positions[j]) = 
      toupper(substr(strings[i], match.positions[j], match.positions[j]))
  }
}

这一招奏效了，但似乎过于复杂。我只是在尝试了一些更直接的方法但没有成功之后才求助于它。以下是我尝试的一些事情，沿着输出：

# Google search suggested \\U might work, but evidently not in R
gsub("(\\b[a-z]+)", "\\U\\1" ,strings)
[1] "Ufirst Uphrase"                "Uanother Uphrase Uto Uconvert"
[3] "Uand Uhere'Us Uanother Uone"   "Ulast-Uone"                   

# I tried this on a lark, but to no avail
gsub("(\\b[a-z]+)", toupper("\\1"), strings)
[1] "first phrase"              "another phrase to convert"
[3] "and here's another one"    "last-one"

正则表达式捕获每个字符串中的正确位置，如对gregexpr的调用所示，但替换字符串显然没有按预期工作。
如果你还不知道，我对正则表达式相对来说是个新手，希望你能帮助我如何让替换程序正确工作。我还想学习如何构造正则表达式，以避免捕获撇号后面的字母，因为我不想更改这些字母的大小写。

regex

来源：https://stackoverflow.com/questions/15776732/how-to-convert-a-vector-of-strings-to-title-case

6条答案

按热度按时间

pod7payv1#

主要的问题是缺少了perl=TRUE（并且正则表达式有一点错误，尽管这可能是由于试图修复第一个问题而导致的）。
使用[:lower:]而不是[a-z]（或[:alpha:]而不是[A-Za-z]）稍微安全一点，以防您的代码最终在一些奇怪的（sorry, Estonians）区域中运行，其中z不是字母表的最后一个字母...

re_from <- "\\b([[:alpha:]])([[:alpha:]]+)"
strings <- c("first phrase", "another phrase to convert",
             "and here's another one", "last-one")
gsub(re_from, "\\U\\1\\L\\2" ,strings, perl=TRUE)
## [1] "First Phrase"              "Another Phrase To Convert"
## [3] "And Here's Another One"    "Last-One"

您可能更喜欢使用\\E（停止大写）而不是\\L（开始小写），这取决于您想要遵循的规则，例如：

string2 <- "using AIC for model selection"
gsub(re_from, "\\U\\1\\E\\2" ,string2, perl=TRUE)
## [1] "Using AIC For Model Selection"

赞(0）回复(0）举报 2023-05-19

hwamh0ep2#

在不使用regex的情况下，tolower的帮助页面有两个示例函数可以实现此功能。
更健壮的版本是

capwords <- function(s, strict = FALSE) {
    cap <- function(s) paste(toupper(substring(s, 1, 1)),
                  {s <- substring(s, 2); if(strict) tolower(s) else s},
                             sep = "", collapse = " " )
    sapply(strsplit(s, split = " "), cap, USE.NAMES = !is.null(names(s)))
}
capwords(c("using AIC for model selection"))
## ->  [1] "Using AIC For Model Selection"

要使regex方法（几乎）工作，您需要设置`perl = TRUE）

gsub("(\\b[a-z]{1})", "\\U\\1" ,strings, perl=TRUE)

[1] "First Phrase"              "Another Phrase To Convert"
[3] "And Here'S Another One"    "Last-One"

但你可能需要更好地处理撇号

sapply(lapply(strsplit(strings, ' '), gsub, pattern = '^([[:alnum:]]{1})', replace = '\\U\\1', perl = TRUE), paste,collapse = ' ')

快速搜索SO找到https://stackoverflow.com/a/6365349/1385941

赞(0）回复(0）举报 2023-05-19

qxsslcnc3#

这里已经有很好的答案了。下面是使用reports包中的一个方便函数的示例：

strings <- c("first phrase", "another phrase to convert",
    "and here's another one", "last-one")

CA(strings)

## > CA(strings)
## [1] "First Phrase"              "Another Phrase To Convert"
## [3] "And Here's Another One"    "Last-one"

虽然它没有大写一个，因为它没有意义这样做，为我的目的。

更新我管理的qdapRegex包中有TC（title case）函数，该函数可以实现真正的title case：

TC(strings)

## [[1]]
## [1] "First Phrase"
## 
## [[2]]
## [1] "Another Phrase to Convert"
## 
## [[3]]
## [1] "And Here's Another One"
## 
## [[4]]
## [1] "Last-One"

赞(0）回复(0）举报 2023-05-19

kxxlusnw4#

为了好玩，我再加一个：

topropper(strings)
[1] "First Phrase"              "Another Phrase To Convert" "And Here's Another One"   
[4] "Last-one"  

topropper <- function(x) {
  # Makes Proper Capitalization out of a string or collection of strings. 
  sapply(x, function(strn)
   { s <- strsplit(strn, "\\s")[[1]]
       paste0(toupper(substring(s, 1,1)), 
             tolower(substring(s, 2)),
             collapse=" ")}, USE.NAMES=FALSE)
}

赞(0）回复(0）举报 2023-05-19

5vf7fwbs5#

下面是另一个基于stringr包的一行代码：

str_to_title(strings, locale = "en")

其中strings是字符串的向量。
Source

赞(0）回复(0）举报 2023-05-19

7tofc5zh6#

将任意大小写转换为任意其他大小写的最佳方法是使用r中的snakecase包。
只需使用包

library(snakecase)
strings = c("first phrase", "another phrase to convert",
        "and here's another one", "last-one")

to_title_case(strings)

## [1] "First Phrase"              "Another Phrase to Convert" 
## [3] "And Here s Another One"    "Last One"

继续编码！

赞(0）回复(0）举报 2023-05-19

我来回答

regex 如何将字符串向量转换为标题大小写

6条答案

相关问题

热门标签

最新问答