条件变量值是否为相同的不同年份，python/pandas最快的解决方案？

myzjeezk 于 2021-09-13 发布在 Java

关注(0)|答案(1)|浏览(312)

我有一个很大的数据集（2000万行）。数据集包含2018年和2019年一个人的居住位置信息。如果变量“county”在2018年和2019年的值相同，我希望编写一个条件，返回true；如果两个值不同，则返回false。实现这一目标最有效的方法是什么？

df=pd.DataFrame({'id': [10, 10, 20, 20, 30, 30, 40, 40], 'year': [2018, 2019, 2018, 2019, 2018, 2019, 2018, 2019],
    'county' : ['1', '1', '4', '2', '3', '3', '1', '3']})

我的目标是创建一个新列，该列对于id 10为true（stayer），对于id 20为false（mover）

python pandas if-statement

来源：https://stackoverflow.com/questions/68298748/condition-if-a-variable-value-is-the-same-diffrent-years-python-pandas-fastest

1条答案

按热度按时间

osh3o9ms1#

对于更有效的解决方案，请不要使用lambda函数，应比较快 first 及 last 每个组的值，如：

g = df.groupby(['id'])['county']
df['newcol'] = g.transform('first').eq(g.transform('last'))
print (df)
   id  year county  newcol
0  10  2018      1    True
1  10  2019      1    True
2  20  2018      4   False
3  20  2019      2   False
4  30  2018      3    True
5  30  2019      3    True
6  40  2018      1   False
7  40  2019      3   False

另一种非分组解决方案应该更有效：

s = df.set_index(['id','year'])['county']

df['newcol'] = df['id'].map(s.xs(2018, level=1).eq(s.xs(2019, level=1)))
print (df)
   id  year county  newcol
0  10  2018      1    True
1  10  2019      1    True
2  20  2018      4   False
3  20  2019      2   False
4  30  2018      3    True
5  30  2019      3    True
6  40  2018      1   False
7  40  2019      3   False

赞(0）回复(0）举报 2021-09-13

我来回答

条件变量值是否为相同的不同年份，python/pandas最快的解决方案？

1条答案

相关问题

热门标签

最新问答