R重新编码变量用于所有不超过一次的观测

uxh89sit  于 2022-12-20  发布在  其他
关注(0)|答案(3)|浏览(236)

我有一个简单的 Dataframe ,如下所示:

Observation X1 X2 Group
1           2   4   1
2           6   3   2
3           8   4   2
4           1   3   3
5           2   8   4
6           7   5   5
7           2   4   5

如何对group变量重新编码,使所有非重现观测都重新编码为“无关联”?
预期产出如下:

Observation X1 X2 Group
1           2   4   Unaffiliated
2           6   3   2
3           8   4   2
4           1   3   Unaffiliated
5           2   8   Unaffiliated
6           7   5   5
7           2   4   5
aiqt4smr

aiqt4smr1#

我们可以使用duplicated为非重复项创建一个逻辑向量,并为这些非重复项将"Group"分配给Unaffiliated

df1$Group[with(df1, !(duplicated(Group)|duplicated(Group, 
     fromLast = TRUE)))] <- "Unaffiliated"
  • 输出
> df1
  Observation X1 X2        Group
1           1  2  4 Unaffiliated
2           2  6  3            2
3           3  8  4            2
4           4  1  3 Unaffiliated
5           5  2  8 Unaffiliated
6           6  7  5            5
7           7  2  4            5

数据

df1 <- structure(list(Observation = 1:7, X1 = c(2L, 6L, 8L, 1L, 2L, 
7L, 2L), X2 = c(4L, 3L, 4L, 3L, 8L, 5L, 4L), Group = c(1L, 2L, 
2L, 3L, 4L, 5L, 5L)), class = "data.frame", row.names = c(NA, 
-7L))
aydmsdu9

aydmsdu92#

unfaffil取一个Group编号的向量,如果它有一个元素,则返回"Unaffiliated",否则返回输入。然后我们可以使用ave按Group应用它。这不会覆盖输入。没有使用包,但如果您使用dplyr,则transform可以替换为mutate

unaffil <- function(x) if (length(x) == 1) "Unaffiliated" else x
transform(dat, Group = ave(Group, Group, FUN = unaffil))

给予

Observation X1 X2        Group
1           1  2  4 Unaffiliated
2           2  6  3            2
3           3  8  4            2
4           4  1  3 Unaffiliated
5           5  2  8 Unaffiliated
6           6  7  5            5
7           7  2  4            5

注解

dat <- structure(list(Observation = 1:7, X1 = c(2L, 6L, 8L, 1L, 2L, 
7L, 2L), X2 = c(4L, 3L, 4L, 3L, 8L, 5L, 4L), Group = c(1L, 2L, 
2L, 3L, 4L, 5L, 5L)), class = "data.frame", row.names = c(NA, 
-7L))
b09cbbtk

b09cbbtk3#

一种方法是首先分组,然后检查最大行数,最后以ifelse

library(dplyr)

df %>% 
  group_by(Group) %>% 
  mutate(Group = ifelse(max(row_number()) == 1, "Unaffiliated", as.character(Group))) %>% 
  ungroup()
Observation    X1    X2 Group       
        <int> <int> <int> <chr>       
1           1     2     4 Unaffiliated
2           2     6     3 2           
3           3     8     4 2           
4           4     1     3 Unaffiliated
5           5     2     8 Unaffiliated
6           6     7     5 5           
7           7     2     4 5

相关问题