我有一个简单的数据集,我想用下面的条件标记行,如果:
- “关闭”和“结束”列都比现在早
- 和[One,Two,Three]中的列Stage
- 并且,“项目编号”列不为空
我想到了下面的线条,但它不起作用。
import pandas as pd
import datetime
import numpy as np
from io import StringIO
csvfile = StringIO("""
ID Stage Close Project Number End"
A899 One 26/08/2019 KL1468 30/08/2019"
A572 Two 31/12/2020 KL1493 17/12/2019"
A778 Three 26/08/2019 16/08/2019"
A704 Four 31/12/2020 KL1036 01/12/2019"
A650 One 31/12/2020 KL1522 23/12/2019"
A830 Two 31/08/2021 KL1535 03/08/2021"
A669 Three 18/08/2021 KL1536 03/08/2021"
A892 Four 31/08/2021 KL1534 03/08/2021"
A789 One 31/05/2021 KL1537 04/08/2021"
A821 Two 31/12/2020 KL1578 03/11/2019"
A992 Three 29/07/2019 KL1609 26/06/2019"
A550 Four 31/12/2020 KL1243 30/11/2019"
A707 One 31/12/2020 KL1523 29/11/2019"
A740
A917 Three 31/07/2021 KL1072 29/07/2021"
A627 Four 30/06/2021 KL1577 15/06/2021"
""")
df = pd.read_csv(csvfile, sep = '\t', engine='python')
def condition_1(s):
if (df['Project Number'].any() and s['Expiry_1'] < datetime.datetime.now() and s['Close_1'] < datetime.datetime.now() and np.where(df['Stage'].isin(['One','Two','Three']))):
return "Overdue"
else:
return ''
df['Expiry_1'] = pd.to_datetime(df['End'].str[3:5] + '/' + df['End'].str[:2] + '/' + df['End'].str[-4:])
df['Close_1'] = pd.to_datetime(df['Close'].str[3:5] + '/' + df['Close'].str[:2] + '/' + df['Close'].str[-4:])
df["Overdue Project"] = df.apply(condition_1, axis=1)
df.to_excel("c:\\Projects\\output.xlsx", index = False)
什么地方出了问题,正确的方法是什么?
1条答案
按热度按时间xzlaal3s1#
首先通过
to_datetime
使用format参数将列转换为datetime:然后测试两列是否小于
DataFrame.lt
或大于DataFrame.gt
,并且需要测试两列是否都是True
s对于第一个掩码m1
使用DataFrame.all
,对于第二个掩码Series.isin
和对于最后一个掩码Series.notna
使用DataFrame.all
,对于逐位AND
,最后通过&
将它们链接起来,并传递到numpy.where
:x一个一个一个一个x一个一个二个x