我有这个数据框:
df <- data.frame(
country = c("CHN","CHN","CHN","LAO","LAO","LAO","GBR", "GBR", "GBR", "DEU", "DEU", "DEU"),
category = c(1,2,3,1,2,3,1,2,3,1,2,3),
value = c(10, 10, 10, 0.9, 0.9, 0.9, 15, 15, 15, 1, 1, 1),
continent = c("Asia","Asia","Asia","Asia","Asia","Asia","Europe","Europe","Europe","Europe","Europe","Europe"),
stringsAsFactors = FALSE)
字符串
我尝试改变国家列。我想保留具有高值的国家名称,但如果它们只占每个类别值总和的10%以下,我想将它们分组在大洲中。输出应该是这样的:
country category value continent
1 CHN 1 10.0 Asia
2 CHN 2 15.0 Asia
3 CHN 3 13.0 Asia
4 Other Asian countries 1 0.9 Asia
5 Other Asian countries 2 1.0 Asia
6 Other Asian countries 3 0.8 Asia
7 GBR 1 15.0 Europe
8 GBR 2 17.0 Europe
9 GBR 3 18.0 Europe
10 Other European countries 1 1.0 Europe
11 Other European countries 2 2.0 Europe
12 Other European countries 3 3.0 Europe
型
但是我的代码将所有国家的名称替换为“其他亚洲/欧洲国家”。我不确定问题出在哪里。
df_trial <- df %>%
group_by(category)%>%
mutate(
country = case_when(
continent == 'Asia' | value<sum(value, na.rm = TRUE)*0.01 ~ 'Other Asian countries',
continent == 'Europe'| value<sum(value, na.rm = TRUE)*0.01 ~ 'Other European countries'))
型
多谢了!
1条答案
按热度按时间56lgkhnf1#
在我看来,最简单的方法是稍微修改一下“其他”类别,然后按大陆分组:
字符串
提供:
型
我使用
case_when()
,以防您喜欢根据原始设置定义更具体的响应。注意10%是0.1而不是0.01