我正在尝试找出问题所在,并在虚拟数据集上运行测试示例-在此构建
import pandas as pd
data = [['tom', 30, 'sales', 5], ['nick', 35, 'sales', 8], ['juli', 24, 'marketing', 4], ['franz', 40, 'marketing', 6], ['jon', 50, 'marketing', 6], ['jeremie', 60, 'marketing', 6]]
df = pd.DataFrame(data, columns=['Name', 'Age', 'Department', 'Tenure'])
对于每一行,我都希望找到部门中比该行年龄大的所有其他人的平均年龄,例如Tom(30)在销售中,应该返回他的年龄和尼克的平均值,尼克更大,所以32. 5作为平均年龄,但是对于Nick,它应该返回35,因为Tom在他的部门比他年轻。下面的代码实现了这一点-但是我正在寻找一个更快更有效的方法?!
#Dynamically get mean, where age is greater than the line in question - almost definitely a better
#way of doing this though
def sumWindow(group):
x = group['Age'].mean()
group['Mean Dept Age'] = x
return group
Name = []
Age = []
Department = []
Tenure = []
MeanDeptAge = []
for index, row in df.iterrows():
n = row['Name']
a = row['Age']
df_temp = df[df['Age'] >= a]
df_present = df_temp.groupby(df['Department']).apply(sumWindow)
df_present['Relevant Name'] = n
df_final = df_present[df_present['Name'] == df_present['Relevant Name']]
Name.append(df_final.iloc[0,0])
Age.append(df_final.iloc[0,1])
Department.append(df_final.iloc[0,2])
Tenure.append(df_final.iloc[0,3])
MeanDeptAge.append(df_final.iloc[0,4])
del df_final
df_final = pd.DataFrame({'Name': Name,
'Age': Age,
'Department': Department,
'Tenure': Tenure,
'Mean Department Age - Greater Than Emp Age': MeanDeptAge,
})
df_final
谢谢!
我已经尝试了许多不同的解决方案,例如在groupby子句中进行过滤
2条答案
按热度按时间kb5ga3dv1#
在按降序排序的DataFrame上使用分组
expanding.mean
或年龄:输出:
wxclj1h52#
出局