如何避免在pandas中使用循环

nhaq1z21 于 2023-05-15 发布在其他

关注(0)|答案(2)|浏览(101)

today = pd.Timestamp.today()  
for x in range(len(df)):
    #trace back
    if df.loc[x,'date_2022'] is pd.NaT and df.loc[x,'date_2021'] is not pd.NaT:
    # extract month and day   
        d1 = today.strftime('%m-%d')  
        d2 = df.loc[x,'date_2021'].strftime('%m-%d')  

    # convert to datetime
        d1 = datetime.strptime(d1, '%m-%d')  
        d2 = datetime.strptime(d2, '%m-%d')  

    # get difference in days 
        diff = d1 - d2
        days = diff.days
    #range 14 days
        if days > 14:
            df.loc[x,'inspection'] = 'check'
        else:
            df.loc[x,'inspection'] = np.nan

我的目标是添加一个检查列，条件是如果2022年的单元格为null（pd.NaT），但去年不是null，并且距离去年的日期已经过去了14天，我如何在不使用循环的情况下编写它？

pandas

来源：https://stackoverflow.com/questions/76251340/how-to-avoid-using-loop-in-pandas

2条答案

按热度按时间

4nkexdtk1#

使用Timestamp.strftime和Series.dt.strftime以及to_datetime作为日期时间，使用Series.isna作为测试缺失值，使用Series.dt.days与条件链接以测试天数差异，并在numpy.where中创建新列：

d1 = pd.to_datetime(pd.Timestamp.today().strftime('%m-%d'), format='%m-%d')
d2 = pd.to_datetime(df['date_2021'].dt.strftime('%m-%d'), format='%m-%d')

m = df['date_2022'].isna()

df['inspection'] = np.where(((d1 - d2).dt.days > 14) & m, 'check', np.nan)

赞(0）回复(0）举报 2023-05-15

ttisahbt2#

好吧，你不能在Pandas中使用循环，即使在@jezrael给出的答案中，循环是通过使用pandas的内置方法抽象出来的。一种更精细的方法是使用pandas.DataFrame.apply并将所有代码抽象到一个方法中，类似这样。
使用您的确切代码-首先，我注意到您正在使用另一个df，所以我认为您可能需要合并/连接两个报告以获得相同 Dataframe 中的列。在下面的例子中，我把你的方法作为主要的方法。

def perform_inspection(row, today, esgReport2021):
    #trace back
    if row['date_2022'] is pd.NaT and esgReport2021.at[row.name,'date_2021'] is not pd.NaT:
        # get the modified date
        old_month = esgReport2021.at[row.name,'date_2021'].month
        old_day = esgReport2021.at[row.name,'date_2021'].day
        old_modified_date = datetime.date(today.year, old_month, old_day)
        # get difference in days
        diff = today - old_modified_date
        days = diff.days
        #range 14 days
        if days > 14:
            row['inspection'] = 'check'
    return row

today = pd.Timestamp.today().date()
df["date_2022"] = pd.NaT    #Assuming this is the 0th day of your report processing.
df["inspection"] = pd.NaT
df = df.apply(perform_inspection, axis=1, args = (today, esgReport2021,))

如您在方法定义中所见，Apply方法将通过将行本身作为第一个参数传递来处理一个或多个“行”级别的操作。

赞(0）回复(0）举报 2023-05-15

我来回答

如何避免在pandas中使用循环

2条答案

相关问题

热门标签

最新问答