将新值赋给R中的最高发生率值

flmtquvp  于 2022-12-20  发布在  其他
关注(0)|答案(3)|浏览(100)

我有类似的数据集:

df_out <- data.frame(
  "name" = c("1", "2", "3", "4", "5", "6", "7", "8"),
  "Factor1"=rep(c("A","B","C"),times= c(2,1,5)),
  "col3"=rep(c("T","S"),times= c(2,6)),
  "col4"=rep(c("E","D"),times= c(6,2)))
df_out

我想对所有列进行变异,并根据它们的计数为它们分配新值,因此对于所有列,我希望出现频率最高的值为共识值,其余所有值为非共识值,NA保持不变。

df_out2 <- data.frame(
  "name" = c("1", "2", "3", "4", "5", "6", "7", "8"),
  "Factor1"=rep(c("non-consensus","consensus"),times= c(3,5)),
  "col3"=rep(c("non-consensus","consensus"),times= c(2,6)),
  "col4"=rep(c("consensus","non-consensus"),times= c(6,2)))
df_out2

任何帮助都很感激。

ddrv8njm

ddrv8njm1#

你可以

library(tidyverse)

df_out %>%
  mutate(across(Factor1:col4, 
    ~ ifelse(.x == names(rev(sort(table(.x))))[1], "consensus", "non-consensus")))
#>   name       Factor1          col3          col4
#> 1    1 non-consensus non-consensus     consensus
#> 2    2 non-consensus non-consensus     consensus
#> 3    3 non-consensus     consensus     consensus
#> 4    4     consensus     consensus     consensus
#> 5    5     consensus     consensus     consensus
#> 6    6     consensus     consensus     consensus
#> 7    7     consensus     consensus non-consensus
#> 8    8     consensus     consensus non-consensus

创建于2022年12月11日,使用reprex v2.0.2

h9vpoimq

h9vpoimq2#

一种基本R方法:

df_out2 <- 
cbind(df_out$name,
      df_out[-1] |> ## don't manipulate name column
      lapply(function(column){ ## apply this function to each column
        level_counts = table(column) ## count observations per factor level        
        ifelse(level_counts[column] == max(level_counts),
               'consensus', 'non consensus'
               )
      }
      ) |> as.data.frame() ## convert list of columns to data frame
      )
df_out$name       Factor1          col3          col4
1           1 non consensus non consensus     consensus
2           2 non consensus non consensus     consensus
3           3 non consensus     consensus     consensus
4           4     consensus     consensus     consensus
5           5     consensus     consensus     consensus
6           6     consensus     consensus     consensus
7           7     consensus     consensus non consensus
8           8     consensus     consensus non consensus
zlhcx6iw

zlhcx6iw3#

下面是一个使用透视的解决方案:关键点是将分组变量设置在正确的位置,并删除n以获得所需的解:
add_countgroup_by(...) and mutate相同

library(dplyr)
library(tidyr)

df_out %>%
  pivot_longer(-name, 
               names_to = "Factors",
               values_to= "Values") %>% 
  add_count(Factors, Values) %>% 
  group_by(Factors) %>% 
  mutate(Values = ifelse(n==max(n), "consensus", "non-consensus")) %>% 
  select(-n) %>% 
  pivot_wider(names_from = Factors,
              values_from = Values)

相关问题