pandas 如何在Python中找到空缺角色

dluptydi  于 2023-04-18  发布在  Python
关注(0)|答案(1)|浏览(92)

我以前问过这个问题,得到的回答是,当时测试用例有效,但现在却产生了不正确的结果。我的数据查看了员工从工作到他们向谁报告的历史记录。我想看到的是,当一个角色被空出并有人填补时。这可以通过数据集中的ManagerPosNum列来识别。如果号码保持不变,但名字改变,那么它是一个空缺的角色,直到一个人唯一的号码改变为他们的号码。
样本数据:

EmpID     Date        ManagerName      ManagerID          ManagerPosNum
  101    May 2022         Adam             201                 1111
  101    June 2022        Adam             201                 1111
  102    February 2021    James            301                 2222
  102    March 2021       James            301                 2222
  102    April 2021       Adam             201                 2222
  102    May 2021         Adam             201                 2222
  103    August 2022      Mary             401                 3333
  103    September 2022   Adam             201                 3333
  103    October 2022     Adam             201                 3333
  103    November 2022    Paul             501                 4444

预期输出:

EmpID      Date         ManagerName     ManagerID     ManagerPosNum       VacantManager 
 101      May 2022         Adam            201            1111
 101      June 2022        Adam            201            1111 
 102      February 2021    James           301            2222
 102      March 2021       James           301            2222
 102      May 2021         Adam            201            2222               James
 102      June 2021        Adam            201            2222               James 
 103      August 2022      Mary            401            3333
 103      September 2022   Adam            201            3333               Mary 
 103      October 2022     Adam            201            3333               Mary
 103      November 2022    Paul            501            4444

当前代码可以工作,但在运行更多测试用例后开始失败。
验证码:

df['Vacant Manager'] = (df.groupby('EmpID', group_keys = False)['ManagerID']
                 .apply(lambda s:s.where(pd.factorize (s[::-1])[0][::-1] == 1).ffill())
ctehm74n

ctehm74n1#

你可以尝试这样的东西:

# create a new column 'VacantManager'
df['VacantManager'] = ''

# iterate over the rows of the dataframe and fill in the 'VacantManager' column
for i, row in df.iterrows():
    if i == 0 or row['ManagerPosNum'] != df.at[i-1, 'ManagerPosNum']:
        # if it's the first row or the ManagerPosNum is different than the previous row
        df.at[i, 'VacantManager'] = ''
    elif row['ManagerName'] != df.at[i-1, 'ManagerName']:
        # if the ManagerPosNum is the same as the previous row but the ManagerName is different
        df.at[i, 'VacantManager'] = df.at[i-1, 'ManagerName']
    else:
        # if the ManagerPosNum and ManagerName are the same as the previous row
        df.at[i, 'VacantManager'] = df.at[i-1, 'VacantManager']
EmpID            Date ManagerName  ManagerID  ManagerPosNum VacantManager
0    101        May 2022        Adam        201           1111              
1    101       June 2022        Adam        201           1111              
2    102   February 2021       James        301           2222              
3    102      March 2021       James        301           2222              
4    102        May 2021        Adam        201           2222         James
5    102       June 2021        Adam        201           2222         James
6    103     August 2022        Mary        401           3333              
7    103  September 2022        Adam        201           3333          Mary
8    103    October 2022        Adam        201           3333          Mary
9    103   November 2022        Paul        501           4444

相关问题