使用在某个时间戳完成的任务的数据集来重构R中的工作调度

k97glaaz  于 2023-01-06  发布在  其他
关注(0)|答案(2)|浏览(108)

我有一个非常广泛的数据集,其中包含R中在特定时间完成特定任务的大量员工。

emp <- c('a','b','c','c','d','d','e','e')
timestamp <- c('1-1-2020 10:00','1-1-2020 16:00','1-2-2020 06:30','1-2-2020 09:00','1-2-2020 20:00','1-3-2020 04:00','1-3-2020 05:00','1-3-2020 10:00')

从这些信息中,我想提取出员工在哪一天的哪一个班次工作。从当前示例中,我将得到以下内容:

此外,我希望在班次定义旁边有一列,说明从第一个班次(2020年1月1日22:00 - 06:00)开始到当前班次之间经过了多少秒。

有人知道我如何在R中创建这样的工作计划吗?最好是使用dplyr的解决方案

t98cgbkg

t98cgbkg1#

library(lubridate)

timestamp <- mdy_hm(timestamp)

starts <- seq(as_date(min(timestamp)) - hours(2), to = max(timestamp), by = "8 hours")

shifts <- lapply(starts, \(start) interval(start, start + hours(8)))

worked_in_shift <- \(shift, timestamp) any(timestamp %within% shift)

data.frame(
  shift = I(shifts),
  outer(
    shifts,
    split(timestamp, emp),
    Vectorize(worked_in_shift)
  )
)
shift     a     b     c     d     e
1 2019-12-.... FALSE FALSE FALSE FALSE FALSE
2 2020-01-....  TRUE FALSE FALSE FALSE FALSE
3 2020-01-.... FALSE  TRUE FALSE FALSE FALSE
4 2020-01-.... FALSE FALSE FALSE FALSE FALSE
5 2020-01-.... FALSE FALSE  TRUE FALSE FALSE
6 2020-01-.... FALSE FALSE FALSE  TRUE FALSE
7 2020-01-.... FALSE FALSE FALSE  TRUE  TRUE
8 2020-01-.... FALSE FALSE FALSE FALSE  TRUE
voase2hg

voase2hg2#

这是一个基于data.table的答案。数据表有一个“滚动连接”特性,这是非常直接的情况。

library(data.table)

# data
df <- data.frame(
  emp = c('a','b','c','c','d','d','e','e'),
  timestamp = c('1-1-2020 10:00','1-1-2020 16:00','1-2-2020 06:30','1-2-2020 09:00',
              '1-2-2020 20:00','1-3-2020 04:00','1-3-2020 05:00','1-3-2020 10:00'))

# setting data.table 
# casting timestamp as time variable
setDT(df)
df$timestamp <- as.POSIXct(strptime( df$timestamp, "%m-%d-%Y %H:%M"))

# create table of "shifts"
periods <- data.table(
  SHIFT_from = seq.POSIXt( from = as.POSIXct("2019-12-31 22:00"), 
                           to = as.POSIXct("2020-01-03 14:00"), by = "8 hours"),
  SHIFT_to = seq.POSIXt( from = as.POSIXct("2020-01-01 06:00"), 
                         to = as.POSIXct("2020-01-03 22:00"), by = "8 hours")) 

# join (rolling) and calculate seconds. 
df <- df[periods, .(
    emp, 
    from = SHIFT_from, 
    to = SHIFT_to, 
    timestamp = x.timestamp, 
    secs = as.integer(difftime(i.SHIFT_from, i.SHIFT_from[1], units = "s"))), 
  on = c("timestamp" = "SHIFT_to"), roll=TRUE]

# output
dcast(df, from+to+secs ~ emp , fun = length)[,-"NA"]

#>                   from                  to   secs a b c d e
#> 1: 2019-12-31 22:00:00 2020-01-01 06:00:00      0 0 0 0 0 0
#> 2: 2020-01-01 06:00:00 2020-01-01 14:00:00  28800 1 0 0 0 0
#> 3: 2020-01-01 14:00:00 2020-01-01 22:00:00  57600 0 1 0 0 0
#> 4: 2020-01-01 22:00:00 2020-01-02 06:00:00  86400 0 1 0 0 0
#> 5: 2020-01-02 06:00:00 2020-01-02 14:00:00 115200 0 0 1 0 0
#> 6: 2020-01-02 14:00:00 2020-01-02 22:00:00 144000 0 0 0 1 0
#> 7: 2020-01-02 22:00:00 2020-01-03 06:00:00 172800 0 0 0 0 1
#> 8: 2020-01-03 06:00:00 2020-01-03 14:00:00 201600 0 0 0 0 1
#> 9: 2020-01-03 14:00:00 2020-01-03 22:00:00 230400 0 0 0 0 1

相关问题