R语言 使用case_when对df的最小值进行变异

kcugc4gi  于 2023-11-14  发布在  其他
关注(0)|答案(1)|浏览(94)

我有这个数据框:

df <- data.frame(
  country = c("CHN","CHN","CHN","LAO","LAO","LAO","GBR", "GBR", "GBR", "DEU", "DEU", "DEU"), 
  category = c(1,2,3,1,2,3,1,2,3,1,2,3), 
  value = c(10, 10, 10, 0.9, 0.9, 0.9, 15, 15, 15, 1, 1, 1), 
  continent = c("Asia","Asia","Asia","Asia","Asia","Asia","Europe","Europe","Europe","Europe","Europe","Europe"),
  stringsAsFactors = FALSE)

字符串
我尝试改变国家列。我想保留具有高值的国家名称,但如果它们只占每个类别值总和的10%以下,我想将它们分组在大洲中。输出应该是这样的:

country category value continent
1                       CHN        1  10.0      Asia
2                       CHN        2  15.0      Asia
3                       CHN        3  13.0      Asia
4     Other Asian countries        1   0.9      Asia
5     Other Asian countries        2   1.0      Asia
6     Other Asian countries        3   0.8      Asia
7                       GBR        1  15.0    Europe
8                       GBR        2  17.0    Europe
9                       GBR        3  18.0    Europe
10 Other European countries        1   1.0    Europe
11 Other European countries        2   2.0    Europe
12 Other European countries        3   3.0    Europe


但是我的代码将所有国家的名称替换为“其他亚洲/欧洲国家”。我不确定问题出在哪里。

df_trial <- df %>%
  group_by(category)%>%
  mutate(
    country = case_when(
      continent == 'Asia' | value<sum(value, na.rm = TRUE)*0.01 ~ 'Other Asian countries',
      continent == 'Europe'| value<sum(value, na.rm = TRUE)*0.01 ~ 'Other European countries'))


多谢了!

56lgkhnf

56lgkhnf1#

在我看来,最简单的方法是稍微修改一下“其他”类别,然后按大陆分组:

df |> mutate(country = case_when(value < sum(value, na.rm = TRUE)*0.1 ~ paste0("Other country (", continent, ")"),
             .default = country),
             .by = continent)

字符串
提供:

country category value continent
1                     CHN        1  10.0      Asia
2                     CHN        2  10.0      Asia
3                     CHN        3  10.0      Asia
4    Other country (Asia)        1   0.9      Asia
5    Other country (Asia)        2   0.9      Asia
6    Other country (Asia)        3   0.9      Asia
7                     GBR        1  15.0    Europe
8                     GBR        2  15.0    Europe
9                     GBR        3  15.0    Europe
10 Other country (Europe)        1   1.0    Europe
11 Other country (Europe)        2   1.0    Europe
12 Other country (Europe)        3   1.0    Europe


我使用case_when(),以防您喜欢根据原始设置定义更具体的响应。
注意10%是0.1而不是0.01

相关问题