我有一套工人的开始时间和停止时间记录。我试图用工人轮班的最后一次停车时间减去工人的第一次开始时间,但我不确定如何正确利用这一时间 first()
及 last()
作用于 Python
. 下面是 Dataframe work
:
worker veh shift_id starttime stoptime
0 11133y QQUK1 111333 2018-12-21 15:17:29 2018-12-21 15:18:57
1 44706h FF243 447064 2019-01-01 00:10:16 2019-01-01 00:16:32
2 44706h FF243 447064 2019-01-01 00:27:11 2019-01-01 00:31:38
3 44706h FF243 447064 2019-01-01 00:46:20 2019-01-01 01:04:54
4 44761y LL525 447617 2019-01-01 00:19:06 2019-01-01 00:39:43
5 44842q OO454 448429 2019-01-01 00:12:35 2019-01-01 00:19:09
6 44842q OO454 448429 2019-01-01 00:47:55 2019-01-01 01:00:01
7 44842q OO454 448429 2019-01-01 01:12:47 2019-01-01 02:01:50
8 46090u OP324 460908 2019-01-01 00:16:23 2019-01-01 00:39:46
9 46090u OP324 460908 2019-01-01 00:58:02 2019-01-01 01:19:02
我希望得到如下输出:
worker veh shift_id starttime stoptime hrs_per_gig
0 11133y QQUK1 111333 2018-12-21 15:17:29 2018-12-21 15:18:57 .0010
1 44706h FF243 447064 2019-01-01 00:10:16 2019-01-01 00:16:32 .0379
2 44706h FF243 447064 2019-01-01 00:27:11 2019-01-01 00:31:38 .0379
3 44706h FF243 447064 2019-01-01 00:46:20 2019-01-01 01:04:54 .0379
4 44761y LL525 447617 2019-01-01 00:19:06 2019-01-01 00:39:43 .0143
5 44842q OO454 448429 2019-01-01 00:12:35 2019-01-01 00:19:09 .0758
6 44842q OO454 448429 2019-01-01 00:47:55 2019-01-01 01:00:01 .0758
7 44842q OO454 448429 2019-01-01 01:12:47 2019-01-01 02:01:50 .0758
8 46090u OP324 460908 2019-01-01 00:16:23 2019-01-01 00:39:46 .0435
9 46090u OP324 460908 2019-01-01 00:58:02 2019-01-01 01:19:02 .0435
在r中使用 data.table
这个包很简单。我是这样做的:
# my grouping variables
group_by = c('worker', 'veh', shift_id)
# produce a new column that calculates difference in first and last work times in hours
work[
,hrs_per_gig:=as.numeric(difftime(last(stoptime),first(starttime), units = "hours"))
,group_by]
我不知道怎样才能达到同样的效果 Python
. 我尝试了以下几点:
# my grouping variables
group_by = ['worker', 'veh', 'shift_id']
# produce a new column that calculates difference in first and last work times in hours
work['hrs_per_gig'] = df.groupby(group_by).last('stoptime') -
df.groupby(group_by['starttime'].first()
但我犯了个错误 ValueError: cannot join with no overlapping index names
. 如有任何建议,将不胜感激。非常感谢。
1条答案
按热度按时间ny6fqffe1#
您可以获取
timedelta
通过执行以下操作来创建对象:例子: