求R中一个日序列的最大值和最小值?

q3aa0525  于 2023-04-27  发布在  其他
关注(0)|答案(3)|浏览(161)

我有一年的每小时数据,我想找到每天的最大值和最小值。我如何在保留与最大值/最小值关联的时间数据的同时进行此操作?我的目标是生成一个平滑最大值/最小值之间的数据点的图形,因此我需要保留与最大值/最小值关联的时间戳列信息。

timestamp        & VALUES   &  &  &  \\
2016-01-01 0:00  & \#VALUE! &  &  &  \\
2016-01-01 1:00  & 2        &  &  &  \\
2016-01-01 2:00  & 0.5      &  &  &  \\
2016-01-01 3:00  & -1       &  &  &  \\
2016-01-01 4:00  & -2       &  &  &  \\
2016-01-01 5:00  & 4        &  &  &  \\
2016-01-01 6:00  & 2        &  &  &  \\
2016-01-01 7:00  & 0        &  &  &  \\
2016-01-01 8:00  & 5        &  &  &  \\
2016-01-01 9:00  & 61.5     &  &  &  \\
2016-01-01 10:00 & 19       &  &  &  \\
2016-01-01 11:00 & 3.5      &  &  &  \\
2016-01-01 12:00 & -1.5     &  &  &  \\
2016-01-01 13:00 & 9        &  &  &  \\
2016-01-01 14:00 & 0.5      &  &  &  \\
2016-01-01 15:00 & 0        &  &  &  \\
2016-01-01 16:00 & -8       &  &  &  \\
2016-01-01 17:00 & 7.5      &  &  &  \\
2016-01-01 18:00 & -9       &  &  &  \\
2016-01-01 19:00 & -80.5    &  &  &  \\
2016-01-01 20:00 & -9       &  &  &  \\
2016-01-01 21:00 & -0.5     &  &  &  \\
2016-01-01 22:00 & -0.5     &  &  &  \\
2016-01-01 23:00 & -2       &  &  &

先谢谢你了!

5vf7fwbs

5vf7fwbs1#

timestamp转换为POSIXct,从中提取日期,并为每个日期保留具有最大值和最小值的行。

library(dplyr)
library(lubridate)

result <- df %>%
  mutate(timestamp = mdy_hm(timestamp), 
         date = as.Date(timestamp)) %>%
  arrange(date, VALUES) %>%
  group_by(date) %>%
  slice(1, n())
bvpmtnay

bvpmtnay2#

您可以在ave中使用strftime

r <- transform(dat, min=ave(values, strftime(timestamp, '%F'), FUN=min),
               max=ave(values, strftime(timestamp, '%F'), FUN=max))
r
#           timestamp values min max
# 1   2016-01-01 0:00    -27 -66  13
# 2   2016-01-01 4:00    -32 -66  13
# 3   2016-01-01 8:00     13 -66  13
# 4  2016-01-01 12:00    -52 -66  13
# 5  2016-01-01 16:00    -66 -66  13
# 6  2016-01-01 20:00     12 -66  13
# 7   2016-01-02 0:00    -19 -53  19
# 8   2016-01-02 4:00     -8 -53  19
# 9   2016-01-02 8:00      8 -53  19
# 10 2016-01-02 12:00     18 -53  19
# 11 2016-01-02 16:00    -53 -53  19
# 12 2016-01-02 20:00     19 -53  19
# 13  2016-01-03 0:00     12 -74  42
# 14  2016-01-03 4:00     27 -74  42
# 15  2016-01-03 8:00    -74 -74  42
# 16 2016-01-03 12:00    -31 -74  42
# 17 2016-01-03 16:00     42 -74  42
# 18 2016-01-03 20:00    -62 -74  42

但是,如果您的数据中有遗漏,您将需要匿名功能。

dat[7, 2] <- NA

transform(
  dat, 
  min=ave(values, strftime(timestamp, '%F'), FUN=\(x) min(x, na.rm=TRUE)), 
  max=ave(values, strftime(timestamp, '%F'), FUN=\(x) max(x, na.rm=TRUE)))   
#           timestamp values min max
# 1   2016-01-01 0:00    -27 -66  13
# 2   2016-01-01 4:00    -32 -66  13
# 3   2016-01-01 8:00     13 -66  13
# 4  2016-01-01 12:00    -52 -66  13
# 5  2016-01-01 16:00    -66 -66  13
# 6  2016-01-01 20:00     12 -66  13
# 7   2016-01-02 0:00     NA -53  19
# 8   2016-01-02 4:00     -8 -53  19
# 9   2016-01-02 8:00      8 -53  19
# 10 2016-01-02 12:00     18 -53  19
# 11 2016-01-02 16:00    -53 -53  19
# 12 2016-01-02 20:00     19 -53  19
# 13  2016-01-03 0:00     12 -74  42
# 14  2016-01-03 4:00     27 -74  42
# 15  2016-01-03 8:00    -74 -74  42
# 16 2016-01-03 12:00    -31 -74  42
# 17 2016-01-03 16:00     42 -74  42
# 18 2016-01-03 20:00    -62 -74  42

更漂亮一点的是POSIXct的真实的时间戳。

r <- transform(dat, timestamp=as.POSIXct(timestamp),
               min=ave(values, strftime(timestamp, '%F'), FUN=min),
               max=ave(values, strftime(timestamp, '%F'), FUN=max))
r
#              timestamp values min max
# 1  2016-01-01 00:00:00    -27 -66  13
# 2  2016-01-01 04:00:00    -32 -66  13
# 3  2016-01-01 08:00:00     13 -66  13
# 4  2016-01-01 12:00:00    -52 -66  13
# 5  2016-01-01 16:00:00    -66 -66  13
# 6  2016-01-01 20:00:00     12 -66  13
# 7  2016-01-02 00:00:00     NA  NA  NA
# 8  2016-01-02 04:00:00     -8  NA  NA
# 9  2016-01-02 08:00:00      8  NA  NA
# 10 2016-01-02 12:00:00     18  NA  NA
# 11 2016-01-02 16:00:00    -53  NA  NA
# 12 2016-01-02 20:00:00     19  NA  NA
# 13 2016-01-03 00:00:00     12 -74  42
# 14 2016-01-03 04:00:00     27 -74  42
# 15 2016-01-03 08:00:00    -74 -74  42
# 16 2016-01-03 12:00:00    -31 -74  42
# 17 2016-01-03 16:00:00     42 -74  42
# 18 2016-01-03 20:00:00    -62 -74  42
  • 数据:*
dat <- structure(list(timestamp = c("2016-01-01 0:00", "2016-01-01 2:00", 
"2016-01-01 4:00", "2016-01-01 6:00", "2016-01-01 8:00", "2016-01-01 10:00", 
"2016-01-01 12:00", "2016-01-01 14:00", "2016-01-01 16:00", "2016-01-01 18:00", 
"2016-01-01 20:00", "2016-01-01 22:00", "2016-01-02 0:00", "2016-01-02 2:00", 
"2016-01-02 4:00", "2016-01-02 6:00", "2016-01-02 8:00", "2016-01-02 10:00", 
"2016-01-02 12:00", "2016-01-02 14:00", "2016-01-02 16:00", "2016-01-02 18:00", 
"2016-01-02 20:00", "2016-01-02 22:00", "2016-01-03 0:00", "2016-01-03 2:00", 
"2016-01-03 4:00", "2016-01-03 6:00", "2016-01-03 8:00", "2016-01-03 10:00", 
"2016-01-03 12:00", "2016-01-03 14:00", "2016-01-03 16:00", "2016-01-03 18:00", 
"2016-01-03 20:00", "2016-01-03 22:00"), values = c(7, 12, NA, 
-70, -4, -22, -13, -76, 13, 45, 48, 55, -64, -30, -20, -8, -10, 
40, -32, 3, -67, -66, -74, -75, 57, 16, -31, -17, 9, 7, -66, 
13, 41, 58, 26, 58)), class = "data.frame", row.names = c(NA, 
-36L))
50few1ms

50few1ms3#

tidyverse解决方案,假设您的 Dataframe 名为df

library(dplyr)
library(lubridate)

result <- df %>%
  mutate(timestamp = as.POSIXct(timestamp), 
         Date = lubridate::date(timestamp)) %>%
  arrange(Date, VALUES) %>%
  group_by(Date) %>%
  summarise(., across(VALUES, max, .names = "max_{.col}",
               across(VALUES, min, .names = "min_{.col}")

相关问题