我在R中有一个很大的dataframe,在第一列中有Names(重复),然后有几列具有不同的字符值。dataframe是这样的:
Names <- c("Benjamin Blue", "Benjamin Blue", "Benjamin Blue", "Sarah Red", "Sarah Red", "Mark Black", "Mark Black", "Mark Black", "Leonie White")
c1 <- c("Health", "Health", "Infrastructure", "Traffic", "Security", "Security", "Security", "Social", "" )
c2 <- c("Social", "", "Traffic", "", "Traffic", "Energy", "Health", "Social", "Security")
c3 <- c("", "", "Infrastructure", "Energy", "Energy", "", "Health", "Securtiy", "Social")
df_test <- data.frame(Names, c1, c2, c3)
df_test
如何计算所有个体(Benjamin Blue
、Sarah Red
、Mark Black
和Leonie White
)在c1
、c2
和c3
列中命名主题Traffic
、Social
和Health
的次数?
我的结果应该是这样的:
Names_result <- c("Benjamin Blue", "Sarah Red", "Mark Black", "Leonie White")
Traffic <- c(1, 2, 0, 0)
Social <- c(1, 0, 2, 1)
Health <- c(2, 0, 2, 0)
我尝试了以下代码:
library(dplyr)
df_test %>%
rowwise() %>%
mutate(Traffic = sum(na.omit(c_across(c1:c3)) == "Social"),
Traffic = ifelse(all(is.na(c_across(c1:c3))), NA, Traffic))
df_test %>%
rowwise() %>%
mutate(Social = sum(na.omit(c_across(c1:c3)) == "Social"),
Social = ifelse(all(is.na(c_across(c1:c3))), NA, Social))
df_test %>%
rowwise() %>%
mutate(Health = sum(na.omit(c_across(c1:c3)) == "Health"),
Health = ifelse(all(is.na(c_across(c1:c3))), NA, Health))
但是这并没有将一行中每个个体的值相加,也不会导致一个 Dataframe 。
4条答案
按热度按时间mbskvtky1#
我们可以先将“wide”格式重塑为“long”格式,
filter
只包含相关项,然后pivot_wider
重塑回“wide”格式。我们可以使用
values_fn = length
来“计数”每一项的出现,values_fill = 0
在pivot_wider
中用0填充缺失的项。r6l8ljro2#
base
解决方案:nbysray53#
以下是使用
pivot_longer
、count
和pivot_wider
的一种方法:另一种方法是使用
dplyover::over
。免责声明:我是维护者,软件包不在CRAN上。数据来自OP
创建于2023-04-26带有reprex v2.0.2
u4dcyp6a4#
data.table
解决方案或
结果