R语言 返回每个组的最频繁字符串值[重复]

cmssoen2  于 2023-11-14  发布在  其他
关注(0)|答案(3)|浏览(98)

这个问题已经有答案了

Select the row with the maximum value in each group(19个回答)
How to select the rows with maximum values in each group with dplyr? [duplicate](6个回答)
4年前关闭.

a <- c(rep(1:2,3))
b <- c("A","A","B","B","B","B")
df <- data.frame(a,b)

> str(b)
chr [1:6] "A" "A" "B" "B" "B" "B"

  a b
1 1 A
2 2 A
3 1 B
4 2 B
5 1 B
6 2 B

我想按变量a分组并返回b的最常见值
我想要的结果应该是

a b
1 1 B
2 2 B


dplyr中,它将类似于

df %>% group_by(a) %>% summarize (b = most.frequent(b))


我提到dplyr只是为了直观地说明问题。

lrl1mhuk

lrl1mhuk1#

关键是开始按ab分组以计算频率,然后只取每组a中最频繁的频率,例如:

df %>% 
  count(a, b) %>%
  slice(which.max(n))

Source: local data frame [2 x 3]
Groups: a

  a b n
1 1 B 2
2 2 B 2

字符串
当然,还有其他方法,所以这只是一个可能的“关键”。

dy1byipe

dy1byipe2#

其他答案忽略了捆绑频率
什么对我有用:

# A and B are tied
a <- c(rep(1:2,5))
b <- c("A","A","A","A","B","B","B","B","C","C")
df3 <- data.frame(a,b)

library(data.table)
setDT(df3)[ , .N, by=.(a, b)][ , .SD[ N == max(N) ], by = a] # includes ties

library(dplyr)
df3 |>
  group_by(a) |>
  count(b) |>
  top_n(1) # includes ties

字符串

7gcisfzg

7gcisfzg3#

by()a的每个值,创建btable(),并提取table()中最大条目的names()

> with(df,by(b,a,function(xx)names(which.max(table(xx)))))
a: 1
[1] "B"
------------------------
a: 2
[1] "B"

字符串
您可以将其 Package 在as.table()中以获得更漂亮的输出,尽管它仍然不完全符合您想要的结果:

> as.table(with(df,by(b,a,function(xx)names(which.max(table(xx))))))
a
1 2 
B B

相关问题