R语言 基于字符串创建新列

liwlm1x9  于 2023-04-03  发布在  其他
关注(0)|答案(4)|浏览(182)

起点。我有一个字符串向量。我不知道他们是什么,直到运行时。

targets <- c("value_1", "value_2")

现在假设我有一个数据框,里面装满了东西。这里有一个最小的例子。碰巧在第一列中有上面字符串的一些示例。

first_column  <- c("-", "-", "-", "value_1", "value_2", "value_1 value_2")
second_column <- c("-", "-", "-", "-", "-", "-")
df <- data.frame(first_column, second_column)

     first_column   second_column
1               -               -
2               -               -
3               -               -
4         value_1               -
5         value_2               -
6 value_1 value_2               -

我想做的是向dataframe添加新列:以搜索变量命名-即value_1和value_2;如果first col不包含该值,则设置为“-”,如果包含,则等于该值。在常规代码中,我会在目标向量上使用forEach...

first_column   second_column       value_1        value_2                    
1               -               -             -              -
2               -               -             -              -
3               -               -             -              -
4         value_1               -       value_1              -
5         value_2               -             -        value_2
6 value_1 value_2               -       value_1        value_2

我试过很多方法,但都不管用…

fjaof16o

fjaof16o1#

你可以通过迭代targets向量并使用grepl()检查每个目标字符串是否出现在df Dataframe 的first_column中来实现。

targets <- c("value_1", "value_2")
    df <- data.frame(first_column = c("-", "-", "-", "value_1", "value_2", "value_1 value_2"),
                     second_column = c("-", "-", "-", "-", "-", "-"))

for (target in targets) {
  # create a new column with the name of the target string
  df[[target]] <- ifelse(grepl(target, df$first_column), target, "-")
}

# reorder the columns so that the target columns are next to the first_column
df <- df[, c(1, 2, 4, 3)]

print(df)
bnl4lu3b

bnl4lu3b2#

lapply-变体:

cbind(df,
  sapply(targets, function(tgt) ifelse(sapply(strsplit(df$first_column, " "), `%in%`, x = tgt),
                                       tgt, "-"),
         simplify = FALSE))
#      first_column second_column value_1 value_2
# 1               -             -       -       -
# 2               -             -       -       -
# 3               -             -       -       -
# 4         value_1             - value_1       -
# 5         value_2             -       - value_2
# 6 value_1 value_2             - value_1 value_2

演练:

  • strsplit(..)将空格分隔的单词/标记分开,以便我们可以测试简单的“集合成员资格”;
  • 内部sapply对列的每行中的每个tgt(来自targets)进行比较;
  • 外部sapply迭代targets,返回一个命名列表
  • cbind(df, ..)将这两个添加到原始df
vfh0ocws

vfh0ocws3#

cbind(df, sapply(targets, \(x)ifelse(grepl(x,df$first_column),x,'-')))

     first_column second_column value_1 value_2
1               -             -       -       -
2               -             -       -       -
3               -             -       -       -
4         value_1             - value_1       -
5         value_2             -       - value_2
6 value_1 value_2             - value_1 value_2
6ie5vjzr

6ie5vjzr4#

另一个基于循环的解决方案:

for(i in seq_along(targets)){

  # function that detects target string by iterating with grepl

  target_in_row <- function(x. , targets. = targets){ sapply(targets.[i], grepl, x = x.)}

  # apply target_in_row() to each row
  # sum the resulting array to simplify the result to a vector
  # assign the vector to new columns names for the target

  df[[targets[i]]] <- colSums(apply(df, 1, target_in_row))

  # Rename the values. IF value == 1 assign target, ELSE "-"

  df[[targets[i]]] <- ifelse(df[[targets[i]]] %in% 1, targets[i], "-")
  
}

相关问题