R语言 如何根据条件和子字符串将列中的值排列到新列中?

vfh0ocws  于 2023-06-19  发布在  其他
关注(0)|答案(4)|浏览(72)

我有一个数据框,包括如下三列。我需要根据以下条件添加新列:如果在var_field中,$后面的字符串等于text中的字符串,则将var中的相应值放入名为new_col的新列中。当text等于NA时,new_col也应该保持NA。我真的很感激你的建议。
| var|正文|无功场|
| - -----|- -----|- -----|
| 一个|快乐|一元兴奋|
| B|伤心|B愤怒|
| C类|生气|C$sad|
| D级|受激|D$happy|
| E级|不适用|E$紧张|
| F型|不适用|F$蓝色|
| G级|不适用|G$寂寞|
预期的新列应类似于列“new_col”。
| var|正文|无功场|新颜色|
| - -----|- -----|- -----|- -----|
| 一个|快乐|一元兴奋|D级|
| B|伤心|B愤怒|C类|
| C类|生气|C$sad| B|
| D级|受激|D$happy|一个|
| E级|不适用|E$紧张|不适用|
| F型|不适用|F$蓝色|不适用|
| G级|不适用|G$寂寞|不适用|

7d7tgy0s

7d7tgy0s1#

对于基数R中的第一个(!)匹配:

df_ <- read.table(header = T, text = "
var text    var_field
A   happy   A$excited
B   sad B$angry
C   angry   C$sad
D   excited D$happy
E   NA  E$nervous
F   NA  F$blue
G   NA  G$lonely")

suffix <- sapply(strsplit(df_$var_field, "$", fixed = TRUE), `[`, 2)
df_$new_col <- df_$var[match(df_$text, suffix)]
df_
#>   var    text var_field new_col
#> 1   A   happy A$excited       D
#> 2   B     sad   B$angry       C
#> 3   C   angry     C$sad       B
#> 4   D excited   D$happy       A
#> 5   E    <NA> E$nervous    <NA>
#> 6   F    <NA>    F$blue    <NA>
#> 7   G    <NA>  G$lonely    <NA>

创建于2023-06-08使用reprex v2.0.2

fivyi3re

fivyi3re2#

另一种方法,使用各种tidyverse函数。
设置测试数据:

testdata <- tribble(
~var, ~text, ~var_field,
"A",    "happy",    "A$excited",
"B",    "sad",  "B$angry",
"C",    "angry",    "C$sad",
"D",    "excited",  "D$happy",
"E",    NA, "E$nervous",
"F",    NA, "F$blue",
"G",    NA, "G$lonely")

创建text的引用,返回var

lookup <- as_vector(testdata$var)
names(lookup) <- testdata$text

然后创建新列

testdata %>% mutate(
  field_text = str_extract(var_field, "(?<=\\$)(.*)"), #drop the leading character and "$"
  new_col = case_when(
    is.na(text) ~ NA_character_,
    .default = lookup[field_text]
  ) # created the new_col as per spec
) %>% 
  select(-field_text) # drop the simplified var_field as no longer needed

给予

# A tibble: 7 × 4
  var   text    var_field new_col
  <chr> <chr>   <chr>     <chr>  
1 A     happy   A$excited D      
2 B     sad     B$angry   C      
3 C     angry   C$sad     B      
4 D     excited D$happy   A      
5 E     NA      E$nervous NA     
6 F     NA      F$blue    NA     
7 G     NA      G$lonely  NA

编辑:这假设text下的非NA选项是唯一的,根据OP这是不正确的。

w8biq8rn

w8biq8rn3#

试试这个:

quux %>%
  mutate(text2 = sub(".*\\$", "", var_field)) %>%
  left_join(quux, by = c(text2 = "text"), suffix = c("", ".y"), multiple = "first") %>%
  mutate(new_col2 = var.y) %>%
  select(-ends_with(".y"), -text2)
#   var    text var_field new_col new_col2
# 1   A   happy A$excited       D        D
# 2   B     sad   B$angry       C        C
# 3   C   angry     C$sad       B        B
# 4   D excited   D$happy       A        A
# 5   E    <NA> E$nervous    <NA>     <NA>
# 6   F    <NA>    F$blue    <NA>     <NA>
# 7   G    <NA>  G$lonely    <NA>     <NA>

数据

quux <- structure(list(var = c("A", "B", "C", "D", "E", "F", "G"), text = c("happy", "sad", "angry", "excited", NA, NA, NA), var_field = c("A$excited", "B$angry", "C$sad", "D$happy", "E$nervous", "F$blue", "G$lonely"), new_col = c("D", "C", "B", "A", NA, NA, NA)), class = "data.frame", row.names = c(NA, -7L))
kqlmhetl

kqlmhetl4#

下面是另一种方法:

library(data.table)

as.data.table(tstrsplit(df$var_field, "$",fixed = T))[
  df, on=.(V2 = text)][, .(var, text=V2,var_field,new_col=V1)]

输出:

var    text var_field new_col
   <char>  <char>    <char>  <char>
1:      A   happy A$excited       D
2:      B     sad   B$angry       C
3:      C   angry     C$sad       B
4:      D excited   D$happy       A
5:      E    <NA> E$nervous    <NA>
6:      F    <NA>    F$blue    <NA>
7:      G    <NA>  G$lonely    <NA>

相关问题