我有一个包含多个组的大型分组数据框,我尝试在每个组中过滤行。为了简化,我将与一个组共享一个简化的数据框,但在其中出现错误。df5按"Detail", "ID", "Year"
分组
data2 = {"Year":["2012","2012","2012","2012","2012","2012","2012","2012","2012"],
"Country":['USA','USA','USA','USA','USA','USA','USA','CANADA',"CANADA"],
"Country_2": ["", "", "", "", "", "", "", "USA", "USA"],
"ID":["AF12","A15","BU14","DU157","L12","N10","RU156","DU157","RU156"],
"Detail":[1,1,1,1,1,1,1,1,1],
"Second_country_available":[False,False,False,False,False,False,False,True,True],
}
df5 = pd.DataFrame(data2)
df5_true = df5["Second_country_available"] == True
Country_2_gr = df5[df5_true].groupby(["Detail", "ID", "Year"])['Country_2'].agg(
'|'.join)
Country_2_gr
grouped_df5 = (df5.groupby(["Detail", "ID", "Year"], group_keys=False)['Country'])
filtered = grouped_df5.transform(lambda g: g.str.fullmatch(Country_2_gr[g.name]))
filtered
错误将是:
return (self._engine.get_loc(key), None)
File "pandas\_libs\index.pyx", line 774, in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc
KeyError: (1, 'A15', '2012')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "packages\pandas\core\indexes\.py", line 3045, in _get_loc_level
raise KeyError(key) from err
KeyError: (1, 'A15', '2012')
这段代码在大多数情况下都能正常工作,所以我不想对它进行根本性的修改,我想修复一下在类似于我所展示的情况下,行将被删除的问题。
1条答案
按热度按时间gt0wga4j1#
Country_2_gr
基于过滤的 Dataframe ,因此它不会具有所有密钥,您可以尝试切换到get
,默认值为: