我试图计算最后3M值的平均值,并将它们添加到数据框的底部,然后使用这些值计算3M的平均值(基本上是2个月的数据加上新添加的平均值),并重复此18次。
我正在努力寻找一种有效的方法来做这件事,这更耗时。我试着用一个双循环来做,但后来找到了一种方法使用一个循环和lapply()
。
但是我想知道是否有更好的方法来避免循环。
library(dplyr)
library(forecast)
library(readxl)
library(data.table)
library(clock)
library(lubridate)
library(tsibble)
df <- read_excel("C:/X/X/X- X/X/Book7.xlsx",sheet = "Loop")
freq = 18
colnames(df)[1]="Dates"
Dates <- df$Dates
Working <- df[,-1]
#--------------------------------------- Creation of Functions ---------------------------------------#
Moving_Average_3M <- function(Working)
{
last_3_row <- tail(Working,3)
# Convert the `last_3_row` object to a two-dimensional object as tail() function returns a vector
last_3_row_df <- data.frame(last_3_row)
# Calculate the mean of the last three rows
mean_last_3 <- data.frame(colMeans(last_3_row_df,na.rm = TRUE))
return(mean_last_3)
}
Rename_Col_and_bind <- function(Working,Output)
{
colnames(Output) <- colnames(Working)
Working <- rbind(Working,Output)
return(Working)
}
#--------------------------------------- End of Creation of Functions ---------------------------------------#
#------------------------------------------ Loops for Execution ---------------------------------------------#
for(i in 1:freq)
{
Output <- data.frame(lapply(Working,Moving_Average_3M))
Working <- Rename_Col_and_bind(Working,Output)
}
view(Output)
我正在使用的 Dataframe 如下所示:
structure(list(Dates = c("2019-01-01", "2019-02-01", "2019-03-01",
"2019-04-01", "2019-05-01", "2019-06-01", "2019-07-01", "2019-08-01",
"2019-09-01", "2019-10-01", "2019-11-01", "2019-12-01", "2020-01-01",
"2020-02-01", "2020-03-01", "2020-04-01", "2020-05-01", "2020-06-01",
"2020-07-01", "2020-08-01", "2020-09-01", "2020-10-01", "2020-11-01",
"2020-12-01", "2021-01-01", "2021-02-01", "2021-03-01", "2021-04-01",
"2021-05-01", "2021-06-01", "2021-07-01", "2021-08-01", "2021-09-01",
"2021-10-01", "2021-11-01", "2021-12-01", "2022-01-01", "2022-02-01",
"2022-03-01", "2022-04-01", "2022-05-01", "2022-06-01", "2022-07-01",
"2022-08-01", "2022-09-01", "2022-10-01"), `XYZ|851` = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 206, 1814, 2324, 772, 1116, 1636, 1906,
957, 829, 911, 786, 938, 1313, 2384, 1554, 1777, 1635, 1534,
1015, 827, 982, 685, 767, 511, 239, 5400, 1301, 426, 261, 201,
33, 27, 28, 46, 11, 55, 47), `XYZ|574` = c(0, 0, 0, 0, 0, 0,
0, 0, 74, 179, 464, 880, 324, 184, 90, 170, 140, 96, 78, 83,
83, 121, 245, 9000, 332, 123, 117, 138, 20, 42, 70, 70, 42, 103,
490, 7500, 488, 245, 142, 95, 63, 343, 57, 113, 100, 105)), row.names = c(NA,
-46L), class = c("tbl_df", "tbl", "data.frame"))
如上所述,两次迭代后的最小输出如下所示:这是这里用来获得两个迭代的循环:
for(i in 1:2)
{
Output <- data.frame(lapply(Working,Moving_Average_3M))
Working <- Rename_Col_and_bind(Working,Output)
}
Working
Dataframe 的dput如下所示:
structure(list(`XYZ|851` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 206,
1814, 2324, 772, 1116, 1636, 1906, 957, 829, 911, 786, 938, 1313,
2384, 1554, 1777, 1635, 1534, 1015, 827, 982, 685, 767, 511,
239, 5400, 1301, 426, 261, 201, 33, 27, 28, 46, 11, 55, 47, 37.6666666666667,
46.5555555555556), `XYZ|574` = c(0, 0, 0, 0, 0, 0, 0, 0, 74,
179, 464, 880, 324, 184, 90, 170, 140, 96, 78, 83, 83, 121, 245,
9000, 332, 123, 117, 138, 20, 42, 70, 70, 42, 103, 490, 7500,
488, 245, 142, 95, 63, 343, 57, 113, 100, 105, 106, 103.666666666667
)), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9",
"10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
"21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31",
"32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42",
"43", "44", "45", "46", "last_3_row", "last_3_row1"), class = c("tbl_df",
"tbl", "data.frame"))
为了进一步解释这一点,我添加了一个Excel图像以清晰起见:
蓝色图像是输出,与您在Working
数据框中看到的输出相同,所使用的公式以黄色突出显示。
1条答案
按热度按时间mspsb9vt1#
决议前的一些思考:
Reduce
或purrr::reduce
),因为一个值的计算依赖于它之前的行(和计算);它更像是一种递归方法,尽管我们不会显式地使用递归rbind
)行在概念上是可行的,但效率非常低,伸缩性也很差;因此,我将在每次迭代中预先分配空间一次(用NA
填充),然后用新值填充行,而不是在每次迭代中使用rbind
然后您可以根据需要填写
Dates
。(我使用
as.data.frame(Working2)
仅仅是为了显示所有小数,因为tibble
的打印方法通常隐藏了一些精度。)