每年加权平均数- R

kfgdxczn  于 2023-06-27  发布在  其他
关注(0)|答案(2)|浏览(99)

我希望计算每个year(2006年至2023年)的interest_rate的平均值。我有一系列的利率决定日期的新利率和日期。
对于在一年中有多个interest_rates的年份,我想通过该特定速率的年份比例来加权平均值。
示例数据:

library(data.table)
df = data.table(date = c("03/08/2006", "09/11/2006", "11/01/2007", "10/05/2007", "05/07/2007", "06/12/2007", "07/02/2008", "10/04/2008", "08/10/2008", "06/11/2008", "04/12/2008", "08/01/2009", "05/02/2009", "05/03/2009", "04/08/2016", "02/11/2017", "02/08/2018", "11/03/2020", "19/03/2020", "16/12/2021", "03/02/2022", "17/03/2022", "05/05/2022", "16/06/2022", "04/08/2022", "22/09/2022", "03/11/2022", "15/12/2022", "02/02/2023", "23/03/2023", "11/05/2023"),
                interest_rate = c(4.7500, 5.0000, 5.2500, 5.5000, 5.7500, 5.5000, 5.2500, 5.0000, 4.5000, 3.0000, 2.0000, 1.5000, 1.0000, 0.5000, 0.2500, 0.5000, 0.7500, 0.2500, 0.1000, 0.2500, 0.5000, 0.7500, 1.0000, 1.2500, 1.7500, 2.2500, 3.0000, 3.5000, 4.0000, 4.2500, 4.500))

结果数据如下所示:
| 年|加权利率|
| - -----|- -----|
| 二零零六年|4.7|
| 二零零七年|……|
| 二零零八年|等等|

6l7fqoea

6l7fqoea1#

不雅的解决方案:

# add to the dataframe a list of dates for the start of each year
df= rbind(df, data.table(date = as.Date(ISOdate(seq(2005,2023),1,1)), "interest_rate" = NA), fill=T)

# set order
setorder(df, -date)

# set value for the new dates to the current interest rate
df[is.na(interest_rate), interest_rate := df[df[is.na(interest_rate)], on = .(date), interest_rate, roll=Inf]]

# identify the previous shift date was
df[, change_date := shift(date, 1L)]

# get next rate change or end of year
df[, next_change_date_or_year := min(as.Date(ISOdate(year,12,31)), change_date, na.rm=T), date]

# get relative duration within year
df[, relative_duration := as.numeric((next_change_date_or_year-date)/365)]

df[, year := year(date)]

# weighted average per year
yearly_IR = df[!is.na(year), .(weight_interest_rate = weighted.mean(`interest rate`, relative_duration)), year]
gzszwxb4

gzszwxb42#

一种方法是为整个系列中的每一天创建一个利率数据框架,按年份分组,然后对一年中每个日期的利率求平均值。
重要的第一步是将字符串形式的日期转换为实际日期。

library(tidyverse)

df %>%
  mutate(date = as.Date(date, '%d/%m/%Y'),
         days = c(as.numeric(diff(date)), 0)) %>%
  rowwise() %>%
  reframe(date = seq(date, by = 'day', length.out = days),
          interest_rate = interest_rate,
          year = lubridate::year(date)) %>%
  group_by(year) %>%
  summarize(interest_rate = mean(interest_rate))
#> # A tibble: 18 x 2
#>     year interest_rate
#>    <dbl>         <dbl>
#>  1  2006         4.84 
#>  2  2007         5.51 
#>  3  2008         4.67 
#>  4  2009         0.644
#>  5  2010         0.5  
#>  6  2011         0.5  
#>  7  2012         0.5  
#>  8  2013         0.5  
#>  9  2014         0.5  
#> 10  2015         0.5  
#> 11  2016         0.398
#> 12  2017         0.291
#> 13  2018         0.604
#> 14  2019         0.75 
#> 15  2020         0.228
#> 16  2021         0.107
#> 17  2022         1.47 
#> 18  2023         3.97

相关问题