R语言 添加滞后一年的多列

1zmg4dgp  于 2023-02-10  发布在  其他
关注(0)|答案(3)|浏览(150)

我需要从 Dataframe 中添加多个列的1年滞后版本。以下是我的数据:

data<-data.frame(Year=c("2011","2011","2011","2012","2012","2012","2013","2013","2013"), 
                 Country=c("America","China","India","America","China","India","America","China","India"),
                 Value1=c(234,443,754,334,117,112,987,903,476),
                 Value2=c(2,4,5,6,7,8,1,2,2))

我想在t-1添加两列,分别包含Value1和Value2,这样我的 Dataframe 看起来就像这样:

我该怎么做呢?这是将变量滞后一年的正确方法吗?
先谢了!

bybem2ql

bybem2ql1#

使用 * 数据表 *:

library(data.table)

setDT(data)
cols <- grep("^Value", colnames(data), value = TRUE)
data[, paste0(cols, "_lag") := lapply(.SD, shift), .SDcols = cols, by = Country]
#    Year Country Value1 Value2 Value1_lag Value2_lag
# 1: 2011 America    234      2         NA         NA
# 2: 2011   China    443      4         NA         NA
# 3: 2011   India    754      5         NA         NA
# 4: 2012 America    334      6        234          2
# 5: 2012   China    117      7        443          4
# 6: 2012   India    112      8        754          5
# 7: 2013 America    987      1        334          6
# 8: 2013   China    903      2        117          7
# 9: 2013   India    476      2        112          8
zour9fqk

zour9fqk2#

dplyr中,按组使用lag

library(dplyr) #1.1.0
data %>% 
  mutate(across(contains("Value"), lag, .names = "{col}_lagged"), .by = Country)

  Year Country Value1 Value2 Value1_lagged Value2_lagged
1 2011 America    234      2            NA            NA
2 2011   China    443      4            NA            NA
3 2011   India    754      5            NA            NA
4 2012 America    334      6           234             2
5 2012   China    117      7           443             4
6 2012   India    112      8           754             5
7 2013 America    987      1           334             6
8 2013   China    903      2           117             7
9 2013   India    476      2           112             8

1.1.0以下版本:

data %>% 
  group_by(Country) %>%
  mutate(across(c(GDP, Population), lag, .names = "{col}_lagged")) %>%
  ungroup()
s5a0g9ez

s5a0g9ez3#

另一种使用dplyr完成工作的方法。

library(dplyr)
data_lagged <- data %>%
  group_by(Country) %>%
  mutate(Value1_Lagged = lag(Value1),
         Value2_Lagged = lag(Value2),
         Year = as.integer(as.character(Year)) + 1)
data_final <- cbind(data, data_lagged[, c("Value1_Lagged", "Value2_Lagged")])
data_final

输出:

Year Country Value1 Value2 Value1_Lagged Value2_Lagged
1 2011 America    234      2            NA            NA
2 2011   China    443      4            NA            NA
3 2011   India    754      5            NA            NA
4 2012 America    334      6           234             2
5 2012   China    117      7           443             4
6 2012   India    112      8           754             5
7 2013 America    987      1           334             6
8 2013   China    903      2           117             7
9 2013   India    476      2           112             8

相关问题