python:在分组的第一行和最后一行应用操作,将结果添加为列的最佳方式是什么?

niwlg2el  于 2021-08-20  发布在  Java
关注(0)|答案(1)|浏览(463)

我有一套工人的开始时间和停止时间记录。我试图用工人轮班的最后一次停车时间减去工人的第一次开始时间,但我不确定如何正确利用这一时间 first()last() 作用于 Python . 下面是 Dataframe work :

worker  veh   shift_id              starttime                stoptime
0  11133y   QQUK1   111333         2018-12-21 15:17:29     2018-12-21 15:18:57
1  44706h   FF243   447064         2019-01-01 00:10:16     2019-01-01 00:16:32
2  44706h   FF243   447064         2019-01-01 00:27:11     2019-01-01 00:31:38
3  44706h   FF243   447064         2019-01-01 00:46:20     2019-01-01 01:04:54
4  44761y   LL525   447617         2019-01-01 00:19:06     2019-01-01 00:39:43
5  44842q   OO454   448429         2019-01-01 00:12:35     2019-01-01 00:19:09
6  44842q   OO454   448429         2019-01-01 00:47:55     2019-01-01 01:00:01
7  44842q   OO454   448429         2019-01-01 01:12:47     2019-01-01 02:01:50
8  46090u   OP324   460908         2019-01-01 00:16:23     2019-01-01 00:39:46
9  46090u   OP324   460908         2019-01-01 00:58:02     2019-01-01 01:19:02

我希望得到如下输出:

worker  veh   shift_id              starttime                stoptime       hrs_per_gig
0  11133y   QQUK1   111333         2018-12-21 15:17:29     2018-12-21 15:18:57       .0010
1  44706h   FF243   447064         2019-01-01 00:10:16     2019-01-01 00:16:32       .0379
2  44706h   FF243   447064         2019-01-01 00:27:11     2019-01-01 00:31:38       .0379
3  44706h   FF243   447064         2019-01-01 00:46:20     2019-01-01 01:04:54       .0379
4  44761y   LL525   447617         2019-01-01 00:19:06     2019-01-01 00:39:43       .0143
5  44842q   OO454   448429         2019-01-01 00:12:35     2019-01-01 00:19:09       .0758
6  44842q   OO454   448429         2019-01-01 00:47:55     2019-01-01 01:00:01       .0758
7  44842q   OO454   448429         2019-01-01 01:12:47     2019-01-01 02:01:50       .0758
8  46090u   OP324   460908         2019-01-01 00:16:23     2019-01-01 00:39:46       .0435
9  46090u   OP324   460908         2019-01-01 00:58:02     2019-01-01 01:19:02       .0435

在r中使用 data.table 这个包很简单。我是这样做的:


# my grouping variables

group_by = c('worker', 'veh', shift_id)

# produce a new column that calculates difference in first and last work times in hours

work[
     ,hrs_per_gig:=as.numeric(difftime(last(stoptime),first(starttime), units = "hours"))
     ,group_by]

我不知道怎样才能达到同样的效果 Python . 我尝试了以下几点:


# my grouping variables

group_by = ['worker', 'veh', 'shift_id']

# produce a new column that calculates difference in first and last work times in hours

work['hrs_per_gig'] = df.groupby(group_by).last('stoptime') - 
df.groupby(group_by['starttime'].first()

但我犯了个错误 ValueError: cannot join with no overlapping index names . 如有任何建议,将不胜感激。非常感谢。

ny6fqffe

ny6fqffe1#

您可以获取 timedelta 通过执行以下操作来创建对象:

grp = df.groupby(group_by)
duration_per_gig = (grp['stoptime'].last() - 
                    grp['starttime'].first()).dt.components

例子:

In [56]: duration_per_gig = (grp['stoptime'].last() - grp['starttime'].first()).dt.components

In [57]: duration_per_gig                                                                                                                                                                                                                                             
Out[57]: 
                       days  hours  minutes  seconds  milliseconds  microseconds  nanoseconds
worker veh   shift_id                                                                        
11133y QQUK1 111333       0      0        1       28             0             0            0
44706h FF243 447064       0      0       54       38             0             0            0
44761y LL525 447617       0      0       20       37             0             0            0
44842q OO454 448429       0      1       49       15             0             0            0
46090u OP324 460908       0      1        2       39             0             0            0

相关问题