R语言 筛选两列的唯一组合

zkure5ic  于 2022-12-20  发布在  其他
关注(0)|答案(2)|浏览(239)

样本数据:

structure(list(name_1 = c("Kevin", "Tom", "Laura", "Julie"), 
    name_2 = c("Tom", "Kevin", "Julie", "Laura"), value = c(10, 
    10, 20, 20)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-4L))

# A tibble: 4 × 3
  name_1 name_2 value
  <chr>  <chr>  <dbl>
1 Kevin  Tom       10
2 Tom    Kevin     10
3 Laura  Julie     20
4 Julie  Laura     20

如何过滤列name_1name_2的唯一组合,而不隔行设置子集?首选tidyverse/dplyr方法。
预期结果如下:

# A tibble: 2 × 3
  name_1 name_2 value
  <chr>  <chr>  <dbl>
1 Kevin  Tom       10
2 Laura  Julie     20
58wvjzkj

58wvjzkj1#

我们可以对名称进行排序(按行,使用pmin/pmax),然后根据排序后的值对distinct进行排序。在本例中,我删除了排序后的列,但如果愿意,您可以选择更好的名称并丢弃原始名称。

library(dplyr)
quux %>%
  mutate(a=pmin(name_1, name_2), b=pmax(name_1, name_2)) %>%
  distinct(a, b, .keep_all = TRUE) %>%
  select(-a, -b)
# # A tibble: 2 x 3
#   name_1 name_2 value
#   <chr>  <chr>  <dbl>
# 1 Kevin  Tom       10
# 2 Laura  Julie     20

数据类型

quux <- structure(list(name_1 = c("Kevin", "Tom", "Laura", "Julie"), name_2 = c("Tom", "Kevin", "Julie", "Laura"), value = c(10, 10, 20, 20)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L))
ocebsuys

ocebsuys2#

你在找这样的东西吗?

df[which(duplicated(apply(df[-3], 1, function(i) toString(sort(i))))),]

# A tibble: 2 × 3
  name_1 name_2 value
  <chr>  <chr>  <dbl>
1 Tom    Kevin     10
2 Julie  Laura     20

相关问题