如何找到时间列的平均值并使用r对其进行分组?

d7v8vwbk  于 2023-10-13  发布在  其他
关注(0)|答案(2)|浏览(101)

我有一个数据框,其中包含一个名为ride_length的列,该列已经是hh:mm:ss格式。我想计算该列的平均值,并将其分为两类:member和casual(位于member_casual列)。
我已经尝试过使用Lubridate库的管道:

  1. df %>%
  2. group_by(member_casual) %>%
  3. seconds_to_period(mean(period_to_seconds(hms(ride_length))))

即使我的论点与网上找到的其他例子相同,我仍然得到这样的信息:
秒到周期的误差(.,mean(period_to_seconds(hms(ride_length):未使用的参数(mean(period_to_seconds(hms(ride_length)
我也尝试了一条更长的路:

  1. df$nride_length <- difftime(strptime(df$ride_length,"%H:%M:%S"),
  2. strptime("00:00:00","%H:%M:%S"),
  3. units="mins")
  4. df.means <- aggregate(df$nride_length,by=list(df$member_casual),mean)
  5. df.means$ride_length <- format(.POSIXct(df.means$x,tz="GMT"), "%H:%M:%S")
  6. df.means

但结果还是有问题:
Group.1 x ride_length 1休闲NA mins 2成员NA mins
我也试着总结一下:

  1. df %>%
  2. group_by(member_casual) %>%
  3. summarise(length_mean = seconds_to_period(mean(period_to_seconds(hms(ride_length)))))

但这表明:

  1. # A tibble: 2 × 2
  2. member_casual length_mean
  3. <chr> <Period>
  4. 1 casual NA
  5. 2 member NA
  6. Warning message:
  7. There were 2 warnings in `summarise()`.
  8. The first warning was:
  9. In argument: `length_mean =
  10. seconds_to_period(mean(period_to_seconds(hms(ride_length))))`.
  11. In group 1: `member_casual = "casual"`.
  12. Caused by warning in `.parse_hms()`:
  13. ! Some strings failed to parse, or all strings are NAs
  14. Run dplyr::last_dplyr_warnings() to see the 1 remaining warning.

请帮

ndh0cuux

ndh0cuux1#

您可以单独使用aggregate()。像使用方差分析那样指定分组。我改变了数据。框架一点,所以有三个“成员”和“休闲”。

  1. dtf <- structure(list(rideable_type=c("electric_bike",
  2. "classic_bike", "classic_bike", "electric_bike",
  3. "classic_bike", "classic_bike"), day_of_week=c(1, 1, 1, 6, 7,
  4. 2), ride_length=structure(c(990, 810, 576, 296, 686, 294),
  5. class=c("hms", "difftime"), units="secs"),
  6. member_casual=c("member", "member", "member", "casual",
  7. "casual", "casual"), nride_length=structure(c(16.5, 13.5, 9.6,
  8. 4.93, 11.43, 4.9), class="difftime", units="mins")),
  9. row.names=c(NA, -6L), class=c("tbl_df", "tbl", "data.frame"))
  10. aggregate(ride_length ~ member_casual, data=dtf, mean)
  11. # member_casual ride_length
  12. # 1 casual 425.33333 secs
  13. # 2 member 792.00000 secs
zy1mlcev

zy1mlcev2#

假设您的数据如下:

  1. x <- c("09:10:01", "10:10:02", "09:40:03","07:10:16", "09:20:02", "08:52:10")
  2. df <- data.frame(member_casual=c(rep('A',3),rep('B',3)),
  3. ride_length=hms(x),stringsAsFactors = F)
  4. df
  5. member_casual ride_length
  6. 1 A 9H 10M 1S
  7. 2 A 10H 10M 2S
  8. 3 A 9H 40M 3S
  9. 4 B 7H 10M 16S
  10. 5 B 9H 20M 2S
  11. 6 B 8H 52M 10S

我试过你上面试过的代码,它对我很有效。

  1. df %>%
  2. group_by(member_casual) %>%
  3. summarise(mean=seconds_to_period(mean(period_to_seconds(ride_length))))
  4. # A tibble: 2 × 2
  5. member_casual mean
  6. <chr> <Period>
  7. 1 A 9H 40M 2S
  8. 2 B 8H 27M 29.3333333333321S

所以请确认你的数据有正确的格式,特别是名为'ride_length'的列,单独运行hms(df$ride_length)并检查它是否成功运行。

展开查看全部

相关问题