基于条件的Pandas While循环-级数的真值不明确

luaexgnf  于 2022-12-09  发布在  其他
关注(0)|答案(2)|浏览(124)

I have two dataframes. They both contain the same columns.
data = pd.DataFrame({'old_name': [2, 1], 'new_name': [2, 3], 'start':['2021-05-20', '2008-08-01'], 'end': ['2021-08-08', '2021-05-20']})
old_name new_name start end 2 2 2021-05-20 2021-08-08 1 3 2008-08-01 2022-05-20
base = pd.DataFrame({'old_name': [3], 'new_name': [3], 'start':['2021-05-19'], 'end': ['2022-12-31'})
old_name new_name start end 3 3 2021-05-19 2022-12-31
I am trying to create a new df that takes a new name df "base" and goes back in time finding all the old names and linking them together in descending date order while having the old name's end date >= the start date of the new name. There can be more than one match between old and new name and I have to follow the trail for all of them until start < '2008-08-01'.
The final result should be: old_name new_name start end 3 3 2021-05-19 2022-12-31 1 3 2008-08-01 2022-05-20

data['start'] = pd.to_datetime(data['start'])
base['start'] = pd.to_datetime(base['start'])
data['start'] = pd.to_datetime(data['start'])
base['start'] = pd.to_datetime(base['start'])

begin_date = datetime.datetime(2008, 8, 1)
list = pd.DataFrame(columns=base.columns)
for index, row in base.iterrows():
     start_date = row['start']
     base_name = row['name']

     while start_date > begin_date:
          temp = data[(data['new_name'] == base_name) & (data['end'] >= start_date)].copy().reset_index(drop=True)
          start_date = data['start']

     list = pd.concat([list, temp], ignore_index=True, sort=False)
     del temp

I get a "Truth value of a series is ambiguous" but I can't seem to find where I can correct my code. The condition evaluates to True so I'm stuck. Can someone please help me get back on track? Please let me know if my question isn't clear. Thank you!!!
km0tfn4u

km0tfn4u1#

您的代码可能在以下行出错,

while start_date > begin_date:

在这一行中,您尝试将一个Pandas系列(start_date)与开始_date进行比较,begin_date不是一个系列,而是一个标量值。
一个简单的解决办法是,

while start_date.values[index] > begin_date:
jbose2ul

jbose2ul2#

在下的第一行中,当您访问一个DataFrame列(该列是一个Series)时,然后将其与某个内容进行比较。例如,data['new_name']是一个Series。

相关问题