如果我的问题没有意义,我很抱歉,因为我不知道如何措辞。
我有两个 Dataframe 。一个有一个县标识符,每个县对应的人口。我提供的伪数据是固定的时间,但实际数据显示不同的县人口值,每个月在几年内。
第二个数据框列出了每个县的相邻县。有些县有多个相邻县,而有些县则没有。此处的目标是查找每个县的相邻县的平均人口。
我在概念上很难正确地处理数据。我尝试使用group_by time和county with summary,但是我得到了每个县相同的平均邻近人口结果,无论年/月。
列“neighbor_county_agg”和“mean_neighbor_county”是我想要的输出。我不知道我是否需要在这里使用for循环,因为一些县有多个邻居,因此“平均人口”单元格可能会在新邻居匹配的基础上构建。此外,人口随月份和年份而变化。
#pseudo data
dput(head(df1))
#note neighbor_county_agg and mean_neighbor_county are my desired output
structure(list(county_ID = c("A", "B", "C", "D", "E", "F"), population = c(100,
350, 200, 100, 50, 80), neighbor_county_agg = c("D, B", "A, F",
"NA", "A, F", "NA", "D, B, G"), mean_neighbor_county = c("100 + 350 / 2",
"100+80 / 2", "NA", "100 + 80 / 2", "NA", "100 + 350 + 50 /3"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
dput(head(df2))
structure(list(county_ID = c("A", "A", "B", "B", "C", "D"), neighbor_county = c("D",
"B", "A", "F", "NA", "A")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
#my attempt
df1 <- df1 %>% mutate(neighbor_population = df1$population[df1$neighbor_county_agg, df2$neighbor_county)])
#note: the pseudo data in my example does not have date
group_by(date, county_ID) %>%
summarise(mean_population=mean(neighbor_population),
.groups = 'drop')
1条答案
按热度按时间2guxujil1#
另一种方法:
样本数据:
总结: