R语言 在日期时间范围连续两天的条件下计算索引变量

jyztefdp  于 2023-09-27  发布在  其他
关注(0)|答案(1)|浏览(108)

玩具数据:

df <- 
  structure(list(datetime = structure(c(1692439200, 1692442800, 
                                        1692446400, 1692450000, 1692453600, 1692457200, 1692460800, 1692464400, 
                                        1692468000, 1692471600, 1692475200, 1692478800, 1692482400, 1692486000, 
                                        1692489600, 1692493200, 1692496800, 1692500400, 1692504000, 1692507600, 
                                        1692511200, 1692514800, 1692518400, 1692522000, 1692525600, 1692529200, 
                                        1692532800, 1692536400, 1692540000, 1692543600, 1692547200, 1692550800, 
                                        1692554400, 1692558000, 1692561600, 1692565200, 1692568800, 1692572400, 
                                        1692576000, 1692579600, 1692583200, 1692586800, 1692590400, 1692594000, 
                                        1692597600, 1692601200), tzone = "", class = c("POSIXct", "POSIXt"
                                        )), var = c(30, 31.1, 31.2, NA, NA, NA, 26.9, 26.5, 25.6, 
                                                            24.7, 24.5, 24.4, 23.6, 23.1, 22.7, 22, 21.3, 21, 20.9, 20.9, 
                                                            22.8, 25.3, 26.2, 27.1, 28.1, 28.5, 28.8, 28.6, 28.1, 26.4, 24.7, 
                                                            23.2, 23, 22.4, 21.5, 20.8, 19.9, 19.7, 20, 19.9, 19.8, 20.2, 
                                                            21.7, 21.9, 23.6, 25.2)), row.names = c(NA, -46L), class = "data.frame")

变量var按小时测量。
对于每一天,我的目标是计算一个指数变量,如果变量var在前一天的下午6点(包括在内)和当天的早上6点(包括在内)之间大于某个值,比如20,则该指数变量为1。当然,其他方面也是零。
显然,我可以使用{dplyr}和{lubridate}中的函数来建立新的变量,这些变量可用于对时间范围进行分组和过滤,如

library(dplyr); library(lubridate) 
df |>
  mutate(date = lubridate::date(datetime), 
         hour = lubridate::hour(datetime))

但这并没有朝着正确的方向发展,因为相关的时间范围跨越了两个日期。我认为zoo::rollapply()或类似的函数应该符合我的需要,但找不到类似的例子。df到时间序列对象的转换不是我想要的。我愿意接受任何像{data.table}方法这样的建议,而不限于{tidyverse}解决方案。

zmeyuzjn

zmeyuzjn1#

library(dplyr); library(lubridate)
df |>
  mutate(day_start_6pm = as_date(datetime - hours(18))) |>
  mutate(ever_over_20 = any(var >= 20),
         always_over_20 = all(var >= 20),
         .by = day_start_6pm)


             datetime  var day_start_6pm ever_over_20 always_over_20
1  2023-08-19 03:00:00 30.0    2023-08-18         TRUE             NA
2  2023-08-19 04:00:00 31.1    2023-08-18         TRUE             NA
3  2023-08-19 05:00:00 31.2    2023-08-18         TRUE             NA
4  2023-08-19 06:00:00   NA    2023-08-18         TRUE             NA
5  2023-08-19 07:00:00   NA    2023-08-18         TRUE             NA
6  2023-08-19 08:00:00   NA    2023-08-18         TRUE             NA
7  2023-08-19 09:00:00 26.9    2023-08-18         TRUE             NA
8  2023-08-19 10:00:00 26.5    2023-08-18         TRUE             NA
9  2023-08-19 11:00:00 25.6    2023-08-18         TRUE             NA
10 2023-08-19 12:00:00 24.7    2023-08-18         TRUE             NA
11 2023-08-19 13:00:00 24.5    2023-08-18         TRUE             NA
12 2023-08-19 14:00:00 24.4    2023-08-18         TRUE             NA
13 2023-08-19 15:00:00 23.6    2023-08-18         TRUE             NA
14 2023-08-19 16:00:00 23.1    2023-08-18         TRUE             NA
15 2023-08-19 17:00:00 22.7    2023-08-18         TRUE             NA
16 2023-08-19 18:00:00 22.0    2023-08-19         TRUE          FALSE
17 2023-08-19 19:00:00 21.3    2023-08-19         TRUE          FALSE
18 2023-08-19 20:00:00 21.0    2023-08-19         TRUE          FALSE
19 2023-08-19 21:00:00 20.9    2023-08-19         TRUE          FALSE
20 2023-08-19 22:00:00 20.9    2023-08-19         TRUE          FALSE
21 2023-08-19 23:00:00 22.8    2023-08-19         TRUE          FALSE
22 2023-08-20 00:00:00 25.3    2023-08-19         TRUE          FALSE
23 2023-08-20 01:00:00 26.2    2023-08-19         TRUE          FALSE
24 2023-08-20 02:00:00 27.1    2023-08-19         TRUE          FALSE
25 2023-08-20 03:00:00 28.1    2023-08-19         TRUE          FALSE
26 2023-08-20 04:00:00 28.5    2023-08-19         TRUE          FALSE
27 2023-08-20 05:00:00 28.8    2023-08-19         TRUE          FALSE
28 2023-08-20 06:00:00 28.6    2023-08-19         TRUE          FALSE
29 2023-08-20 07:00:00 28.1    2023-08-19         TRUE          FALSE
30 2023-08-20 08:00:00 26.4    2023-08-19         TRUE          FALSE
31 2023-08-20 09:00:00 24.7    2023-08-19         TRUE          FALSE
32 2023-08-20 10:00:00 23.2    2023-08-19         TRUE          FALSE
33 2023-08-20 11:00:00 23.0    2023-08-19         TRUE          FALSE
34 2023-08-20 12:00:00 22.4    2023-08-19         TRUE          FALSE
35 2023-08-20 13:00:00 21.5    2023-08-19         TRUE          FALSE
36 2023-08-20 14:00:00 20.8    2023-08-19         TRUE          FALSE
37 2023-08-20 15:00:00 19.9    2023-08-19         TRUE          FALSE
38 2023-08-20 16:00:00 19.7    2023-08-19         TRUE          FALSE
39 2023-08-20 17:00:00 20.0    2023-08-19         TRUE          FALSE
40 2023-08-20 18:00:00 19.9    2023-08-20         TRUE          FALSE
41 2023-08-20 19:00:00 19.8    2023-08-20         TRUE          FALSE
42 2023-08-20 20:00:00 20.2    2023-08-20         TRUE          FALSE
43 2023-08-20 21:00:00 21.7    2023-08-20         TRUE          FALSE
44 2023-08-20 22:00:00 21.9    2023-08-20         TRUE          FALSE
45 2023-08-20 23:00:00 23.6    2023-08-20         TRUE          FALSE
46 2023-08-21 00:00:00 25.2    2023-08-20         TRUE          FALSE

相关问题