移动长度R的平均年增长率

qgelzfjb  于 2023-01-22  发布在  其他
关注(0)|答案(1)|浏览(141)

鉴于我的数据结构如下,我想计算未来五年的平均年增长率(其中第一年是每个个体的markup列的lead(markup)(个体在(gvkey)列中标识),并将其平均值作为一列添加到数据框中。但是,一些个体的观测少于五年,对于所有个体,他们过去4年的观察,未来少于5年的观察。对于这些情况,平均年增长率应根据未来的观察次数进行调整(最多5次)。

dput(example)
structure(list(gvkey = c(1001L, 1001L, 1001L, 1003L, 1003L, 1003L, 
1003L, 1003L, 1003L, 1003L, 1004L, 1004L, 1004L, 1004L, 1004L, 
1004L, 1004L, 1004L, 1004L, 1004L, 1004L, 1004L, 1004L, 1004L, 
1004L, 1004L, 1004L, 1004L, 1004L, 1004L), fyear = c(1983L, 1984L, 
1985L, 1983L, 1984L, 1985L, 1986L, 1987L, 1988L, 1989L, 1980L, 
1981L, 1982L, 1983L, 1984L, 1985L, 1986L, 1987L, 1988L, 1989L, 
1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 
1999L), markup = c(3.02456418383518, 2.91714600416106, 2.97620103473762, 
0.628645648836935, 0.538264738598443, 0.74536402337831, 0.89905329776662, 
0.571759161863088, 0.510497237569061, 0.621391904401246, 0.320146680750145, 
0.277978758953348, 0.31442332968701, 0.319433516915814, 0.324865816687745, 
0.335264348013352, 0.328048313395744, 0.326632245360565, 0.340874293859881, 
0.320374201245953, 0.27456562124358, 0.276693369097675, 0.245072145096866, 
0.241026046834387, 0.242841330851661, 0.249635000371186, 0.257903948772679, 
0.262641379065405, 0.261534064206543, 0.22953354130982)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -30L), groups = structure(list(
    gvkey = c(1001L, 1003L, 1004L), .rows = structure(list(1:3, 
        4:10, 11:30), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -3L), .drop = TRUE))

这是我所知道的

example %>%
  filter(fyear %in% 1980:2019)%>%
  group_by((gvkey))%>%
  mutate(markupchange = ((((lead(markup)-markup)/markup)+(lead(markup, K =2)-lead(markup)/lead(markup))+(lead(markup, K =3)-lead(markup, k =2)/lead(markup, K=2))+(lead(markup, K =4)-lead(markup, k =3)/lead(markup, K=3))+(lead(markup, K =5)-lead(markup, k =4)/lead(markup, K=4))/5)))

我想不出如何表示的是,对于那些未来观察少于5次的情况,缩短平均年增长率的长度。
作为输出,我希望返回相同的数据框,但多了一列,用于markup的平均年增长率。所添加列的第1行的值应为-0,00628231878798876,第2行的值应为0,020547945。非常感谢您的提示。

wgmfuz8q

wgmfuz8q1#

下面是一个非相等连接方法,使用fuzzyjoin(直到dplyr-1.1.0join_by一起发布)。

ungroup(example) %>%
  mutate(rn = row_number(), fy5 = fyear + 5) %>%
  fuzzyjoin::fuzzy_left_join(
    example, by = c(gvkey="gvkey", fyear="fyear", fy5="fyear"), 
    match_fun = list(`==`, `<=`, `>=`)) %>%
  group_by(gvkey = gvkey.x, fyear = fyear.x, markup = markup.x, rn) %>%
  summarize(
    avg5 = mean(c(diff(markup.y),NA) / markup.y, na.rm=T),
    .groups = "drop"
  ) %>%
  select(-rn)
# # A tibble: 30 × 4
#    gvkey fyear markup      avg5
#    <int> <int>  <dbl>     <dbl>
#  1  1001  1983  3.02   -0.00764
#  2  1001  1984  2.92    0.0202 
#  3  1001  1985  2.98  NaN      
#  4  1003  1983  0.629  -0.00480
#  5  1003  1984  0.538   0.0674 
#  6  1003  1985  0.745  -0.0119 
#  7  1003  1986  0.899  -0.0847 
#  8  1003  1987  0.572   0.0550 
#  9  1003  1988  0.510   0.217  
# 10  1003  1989  0.621 NaN      
# # … with 20 more rows
# # ℹ Use `print(n = ...)` to see more rows

相关问题