Python panda中与filter子句等效的窗口函数

ds97pgxw  于 2023-02-02  发布在  Python
关注(0)|答案(2)|浏览(92)

我正在尝试找出问题所在,并在虚拟数据集上运行测试示例-在此构建

import pandas as pd

data = [['tom', 30, 'sales', 5], ['nick', 35, 'sales', 8], ['juli', 24, 'marketing', 4], ['franz', 40, 'marketing', 6], ['jon', 50, 'marketing', 6], ['jeremie', 60, 'marketing', 6]]
  
df = pd.DataFrame(data, columns=['Name', 'Age', 'Department', 'Tenure'])

对于每一行,我都希望找到部门中比该行年龄大的所有其他人的平均年龄,例如Tom(30)在销售中,应该返回他的年龄和尼克的平均值,尼克更大,所以32. 5作为平均年龄,但是对于Nick,它应该返回35,因为Tom在他的部门比他年轻。下面的代码实现了这一点-但是我正在寻找一个更快更有效的方法?!

#Dynamically get mean, where age is greater than the line in question - almost definitely a better
#way of doing this though

def sumWindow(group):

    x = group['Age'].mean()
    group['Mean Dept Age'] = x
    return group

Name = []
Age = []
Department = []
Tenure = []
MeanDeptAge = []

for index, row in df.iterrows():
    
    
    n = row['Name']
    a = row['Age']
    df_temp = df[df['Age'] >= a]
    df_present = df_temp.groupby(df['Department']).apply(sumWindow)
    df_present['Relevant Name'] = n
    df_final = df_present[df_present['Name'] == df_present['Relevant Name']]
    Name.append(df_final.iloc[0,0])
    Age.append(df_final.iloc[0,1])
    Department.append(df_final.iloc[0,2])
    Tenure.append(df_final.iloc[0,3])
    MeanDeptAge.append(df_final.iloc[0,4])

del df_final    
    
df_final = pd.DataFrame({'Name': Name, 
                         'Age': Age, 
                         'Department': Department, 
                         'Tenure': Tenure, 
                         'Mean Department Age - Greater Than Emp Age': MeanDeptAge, 
                        })

df_final

谢谢!
我已经尝试了许多不同的解决方案,例如在groupby子句中进行过滤

kb5ga3dv

kb5ga3dv1#

在按降序排序的DataFrame上使用分组expanding.mean或年龄:

df['Mean Department Age - Greater Than Emp Age'] = (df
   .sort_values(by='Age', ascending=False)
   .groupby('Department')['Age']
   .expanding().mean()
   .droplevel(0)
)
  • 注意:这将根据订单处理潜在的重复年龄,如果在您的真实的用例中发生这种情况,您应该定义您希望如何继续。*

输出:

Name  Age Department  Tenure  Mean Department Age - Greater Than Emp Age
0      tom   30      sales       5                                        32.5
1     nick   35      sales       8                                        35.0
2     juli   24  marketing       4                                        43.5
3    franz   40  marketing       6                                        50.0
4      jon   50  marketing       6                                        55.0
5  jeremie   60  marketing       6                                        60.0
wxclj1h5

wxclj1h52#

def function1(dd:pd.DataFrame):
    dd1=dd.sort_values("Age",ascending=False).Age.expanding().mean()
    return dd1.rename("Mean Department Age - Greater Than Emp Age")

df.join(df.groupby('Department').apply(function1).droplevel(0))

出局

Name  Age Department  Tenure  Mean Department Age - Greater Than Emp Age
0      tom   30      sales       5                                        32.5
1     nick   35      sales       8                                        35.0
2     juli   24  marketing       4                                        43.5
3    franz   40  marketing       6                                        50.0
4      jon   50  marketing       6                                        55.0
5  jeremie   60  marketing       6                                        60.0

相关问题