R语言 查找每个县邻近县的平均人口

gmol1639  于 2023-02-14  发布在  其他
关注(0)|答案(1)|浏览(129)

如果我的问题没有意义,我很抱歉,因为我不知道如何措辞。
我有两个 Dataframe 。一个有一个县标识符,每个县对应的人口。我提供的伪数据是固定的时间,但实际数据显示不同的县人口值,每个月在几年内。
第二个数据框列出了每个县的相邻县。有些县有多个相邻县,而有些县则没有。此处的目标是查找每个县的相邻县的平均人口。
我在概念上很难正确地处理数据。我尝试使用group_by time和county with summary,但是我得到了每个县相同的平均邻近人口结果,无论年/月。
列“neighbor_county_agg”和“mean_neighbor_county”是我想要的输出。我不知道我是否需要在这里使用for循环,因为一些县有多个邻居,因此“平均人口”单元格可能会在新邻居匹配的基础上构建。此外,人口随月份和年份而变化。

#pseudo data

dput(head(df1))

#note neighbor_county_agg and mean_neighbor_county are my desired output

structure(list(county_ID = c("A", "B", "C", "D", "E", "F"), population = c(100, 
350, 200, 100, 50, 80), neighbor_county_agg = c("D, B", "A, F", 
"NA", "A, F", "NA", "D, B, G"), mean_neighbor_county = c("100 + 350 / 2", 
"100+80 / 2", "NA", "100 + 80 / 2", "NA", "100 + 350 + 50 /3"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

dput(head(df2))

structure(list(county_ID = c("A", "A", "B", "B", "C", "D"), neighbor_county = c("D", 
"B", "A", "F", "NA", "A")), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

#my attempt
df1 <- df1 %>% mutate(neighbor_population = df1$population[df1$neighbor_county_agg, df2$neighbor_county)]) 

#note: the pseudo data in my example does not have date
group_by(date, county_ID) %>% 
  summarise(mean_population=mean(neighbor_population),
            .groups = 'drop')
2guxujil

2guxujil1#

另一种方法:

library(dplyr)
library(tidyr) # unnest, only for data creation

样本数据:

df1 <- data.frame(county=LETTERS[1:7], popn = c(100,350,200,100,50,80,50))
df2 <- data.frame(county=LETTERS[1:7], neighbor = c("D,B","A,F",NA,"A,F",NA,"D,B,G","F")) %>%
  mutate(neighbor = strsplit(neighbor, ",")) %>%
  unnest(neighbor)

总结:

df1 %>%
  left_join(df2, by = "county") %>%
  left_join(df1, by = c(neighbor = "county")) %>%
  group_by(county) %>%
  summarize(
    popn = first(popn.x),
    popn_neighbors_avg = mean(popn.y)
  )
# # A tibble: 7 × 3
#   county  popn popn_neighbors_avg
#   <chr>  <dbl>              <dbl>
# 1 A        100               225 
# 2 B        350                90 
# 3 C        200                NA 
# 4 D        100                90 
# 5 E         50                NA 
# 6 F         80               167.
# 7 G         50                80

相关问题