regex 基于正则表达式拆分data.table列[重复]

x9ybnkn6  于 2023-03-31  发布在  其他
关注(0)|答案(3)|浏览(124)

此问题在此处已有答案

Split data frame string column into multiple columns(16个答案)
三年前关闭了。
我有一个data.table,它有三列。第二列,我想基于正则表达式进行拆分,所以我最终会有四列。当我这样做时,我一直得到奇怪的响应,我希望得到一些反馈。下面是数据:

category                 label     count
1  Navigation     Product || Green         2 
2  Navigation      Survey || Green         5
3  Navigation       Product || Red        10
4  Navigation        Survey || Red        10

我想在||处拆分标签部分,并创建两个新列TypeColor

vmpqdwk3

vmpqdwk31#

使用data.table,您可以执行以下操作:

dt[, c("type", "color") := tstrsplit(label, " || ", fixed = TRUE)]

     category            label count    type color
1: Nagivation Product || Green     2 Product Green
2: Navigation  Survey || Green     5  Survey Green

样本数据:

dt <- data.table(category = c("Nagivation", "Navigation"),
                 label = c("Product || Green", "Survey || Green"),
                 count = c(2, 5))
o7jaxewo

o7jaxewo2#

我们可以使用tidyr::separate

library(data.table)

dt1 <- fread("category     label            count
              Navigation   Product || Green     2
              Navigation   Survey || Green      5
              Navigation   Product || Red      10
              Navigation   Survey || Red       10")

tidyr::separate(dt1, label, sep = "\\|\\|", into = c("Type","Color"))

#>      category    Type   Color count
#> 1: Navigation Product   Green     2
#> 2: Navigation  Survey   Green     5
#> 3: Navigation Product     Red    10
#> 4: Navigation  Survey     Red    10
x6h2sr28

x6h2sr283#

cbind(d, setNames(data.frame(do.call(rbind, strsplit(d$label, " || ", fixed = TRUE))),
         c("Type", "Color")))
#    category            label count     Type  Color
#1 Navigation Product || Green     2 Product   Green
#2 Navigation  Survey || Green     5  Survey   Green
#3 Navigation   Product || Red    10 Product     Red
#4 Navigation    Survey || Red    10  Survey     Red

数据

d = structure(list(category = c("Navigation", "Navigation", "Navigation", 
"Navigation"), label = c("Product || Green", "Survey || Green", 
"Product || Red", "Survey || Red"), count = c(2L, 5L, 10L, 10L
)), class = "data.frame", row.names = c(NA, -4L))

相关问题