R:如何根据不同的数据框给数据框中的每个人给予相同的身份证号码?

qvk1mo1f  于 2023-02-14  发布在  其他
关注(0)|答案(2)|浏览(125)

我有两个数据集,df和df2,这是两个非常大且杂乱的 Dataframe 的过度简化版本。
在最初的df中,我通过按腰带和体重分组为每个人创建了一个唯一的id。我希望每个人在df中拥有的相同id号被赋予df2中的相同人员。他们需要有相同的名字,并且应该按腰带和体重分组。注意,df2中有一些人不在df中。
简化的df如下所示

belt     weight rank id        name
1  purple open class    1 55  Tom Cruise
2   black    rooster    2 79 Emma Watson
3    blue    feather    3 63    John Doe
4    blue    feather    4 63    John Doe
5  purple open class    5 55  Tom Cruise
6   brown      heavy    6  3  James Bond
7  purple open class    7 55  Tom Cruise
8  purple      heavy    8 61  Tom Cruise
9   black open class    9 70    Jane Doe
10 purple      heavy   10 61  Tom Cruise

第二个数据框看起来像这样。一个人谁是在df2,但不是在df应该收到一个NA为他们的id。注意,id的必须由腰带和重量,因为有些人有不同的点取决于他们参加的重量分区

belt2    weight2 rank2        name points
1  purple open class     1  Tom Cruise    100
2   black    rooster     2 Emma Watson     30
3    blue    feather     3    John Doe     50
4    blue    feather     4    John Doe     50
5  purple open class     5  Tom Cruise    100
6   brown      heavy     6  James Bond    200
7   black    rooster     7    Jon Snow     92
8  purple      heavy     8  Tom Cruise     77
9   black open class     9    Jane Doe     88
10 purple      heavy    10  Tom Cruise     77

这是我希望df2的样子:

belt2    weight2 rank2 id           name points
1  purple open class     1 55     Tom Cruise    100
2   black    rooster     2 79    Emma Watson     30
3    blue    feather     3 63       John Doe     50
4    blue    feather     4 63       John Doe     50
5  purple open class     5 55     Tom Cruise    100
6   brown      heavy     6  3     James Bond    200
7   black    rooster     7 NA       Jon Snow     92
8  purple      heavy     8 61     Tom Cruise     77
9   black open class     9 70       Jane Doe     88
10 purple      heavy    10 61     Tom Cruise     77

基本上,我希望df2中的ID号与df中的ID号匹配。如果不匹配,请填写NA。

# df
belt <- c("purple", "black", "blue", "blue", "purple", "brown", "purple", "purple", "black", "purple")
weight <- c("open class", "rooster", "feather", "feather", "open class", "heavy", "open class", "heavy", "open class", "heavy")
rank <- 1:10
id <- c(55, 79, 63, 63, 55, 3, 55, 61, 70, 61)
names <- c("Tom Cruise", "Emma Watson", "John Doe", "John Doe", "Tom Cruise", "James Bond", "Tom Cruise", "Tom Cruise", "Jane Doe", "Tom Cruise")
(df <- data.frame(belt, weight, rank, id, name = names))

#df2
belt2 <- c("purple", "black", "blue", "blue", "purple", "brown", "black", "purple", "black", "purple")
weight2 <- c("open class", "rooster", "feather", "feather", "open class", "heavy", "rooster", "heavy", "open class", "heavy")
rank2 <- 1:10
names2 <- c("Tom Cruise", "Emma Watson", "John Doe", "John Doe", "Tom Cruise", "James Bond", "Jon Snow", "Tom Cruise", "Jane Doe", "Tom Cruise")
points <- c(100, 30, 50, 50, 100, 200, 92, 77, 88, 77)
(df2 <- data.frame(belt2, weight2, rank2, name = names2, points))
roejwanj

roejwanj1#

这可以通过右连接并删除它后面的重复项来解决。我将使用基本函数merge

df3 <- merge(
  df, df2, 
  by.x = c("belt", "weight", "rank", "name"), 
  by.y = c("belt2", "weight2", "rank2", "name"),
  all.y = TRUE
)
df3 <- df3[!duplicated(df3),]
df3[order(df3$rank),]
#>      belt     weight rank        name id points
#> 9  purple open class    1  Tom Cruise 55    100
#> 2   black    rooster    2 Emma Watson 79     30
#> 4    blue    feather    3    John Doe 63     50
#> 5    blue    feather    4    John Doe 63     50
#> 10 purple open class    5  Tom Cruise 55    100
#> 6   brown      heavy    6  James Bond  3    200
#> 3   black    rooster    7    Jon Snow NA     92
#> 7  purple      heavy    8  Tom Cruise 61     77
#> 1   black open class    9    Jane Doe 70     88
#> 8  purple      heavy   10  Tom Cruise 61     77

创建于2023年2月8日,使用reprex v2.0.2
dplyr右联接为

suppressPackageStartupMessages({
  library(dplyr)
})

df %>%
  right_join(df2, 
             by = c("belt" = "belt2", "weight" = "weight2", "rank" = "rank2"),
             suffix = c(".x", "")) %>%
  select(-name.x) %>%
  arrange(rank)
#>      belt     weight rank id        name points
#> 1  purple open class    1 55  Tom Cruise    100
#> 2   black    rooster    2 79 Emma Watson     30
#> 3    blue    feather    3 63    John Doe     50
#> 4    blue    feather    4 63    John Doe     50
#> 5  purple open class    5 55  Tom Cruise    100
#> 6   brown      heavy    6  3  James Bond    200
#> 7   black    rooster    7 NA    Jon Snow     92
#> 8  purple      heavy    8 61  Tom Cruise     77
#> 9   black open class    9 70    Jane Doe     88
#> 10 purple      heavy   10 61  Tom Cruise     77

创建于2023年2月8日,使用reprex v2.0.2

0sgqnhkj

0sgqnhkj2#

您可以通过在两个 Dataframe 之间使用left join来完成此任务。

df2 = df2 %>% left_join(df, by = c("belt2" = "belt", "weight2" = "weight", "name" = "name")) %>% select(belt2, weight2, rank2, name, points, id)

    belt2    weight2 rank2        name points id
1  purple open class     1  Tom Cruise    100 55
2  purple open class     1  Tom Cruise    100 55
3  purple open class     1  Tom Cruise    100 55
4   black    rooster     2 Emma Watson     30 79
5    blue    feather     3    John Doe     50 63
6    blue    feather     3    John Doe     50 63
7    blue    feather     4    John Doe     50 63
8    blue    feather     4    John Doe     50 63
9  purple open class     5  Tom Cruise    100 55
10 purple open class     5  Tom Cruise    100 55
11 purple open class     5  Tom Cruise    100 55
12  brown      heavy     6  James Bond    200  3
13  black    rooster     7    Jon Snow     92 NA
14 purple      heavy     8  Tom Cruise     77 61
15 purple      heavy     8  Tom Cruise     77 61
16  black open class     9    Jane Doe     88 70
17 purple      heavy    10  Tom Cruise     77 61
18 purple      heavy    10  Tom Cruise     77 61

相关问题