我试图对Pandas DataFrame中按日期和类别分组的值之间的差异进行排序。最后,重要的是两个日期之间增长最低和最高的类别的名称,以及相应的增长。
我认为我的代码工作,但它看起来过于复杂。我想找到最好的Pandas的方式(最快,最标准,最直接向前等)来做这件事。以下是我的代码:
import pandas as pd
import numpy as np
# Creation of random data
size = 1_000
df = pd.DataFrame()
df['Borough'] = np.random.choice(['Brooklyn', 'Manhattan', 'Bronx', 'Queens', 'Staten Island'], size)
df['Date'] = pd.to_datetime(np.random.randint(2011, 2021, size), format="%Y")
df['Nbr_permits'] = np.random.randint(0, 300, size)
# Calculation of the sorted differences in the number of permits per boroughs between 2011 and 2020
res = (df[(df['Date'].dt.year == 2020)].groupby('Borough')['Nbr_permits'].sum() - df[(df['Date'].dt.year == 2011)].groupby('Borough')['Nbr_permits'].sum()).sort_values().dropna()
#Lowest progression of nbr_permits between 2011 and 2020:
print(res.idxmin(), res[res.idxmin()])
#Highest progression of nbr_permits between 2011 and 2020:
print(res.idxmax(), res[res.idxmax()])
我能和Pandas做得更好吗?
1条答案
按热度按时间8yoxcaq71#
有更好的办法
结果