pandas 根据要用于筛选的不同列组合筛选DataFrame

4ngedf3f  于 2022-12-09  发布在  其他
关注(0)|答案(1)|浏览(124)

我的CSV示例:

open_local_data,country,competition,match_id,match_name,market_id,market_name,runner_id,runner_name,status,total_matched,odds,market_matched,percentage,above_odds,result,back,lay
2022-08-24 15:00:00,UY,Uruguayan Segunda Division,31684262,Uruguay Montevideo FC v Progreso,1.202440748,Match Odds,11076801,Uruguay Montevideo FC,OPEN,197.2,2.88,448.52,43.96682422188531,9.24460199966309,WINNER,1.7578,-1
2022-08-24 15:00:00,AT,Austrian Matches,31685733,SV Gerasdorf Stammersdorf v Dinamo Helfort,1.202453470,Match Odds,10299781,SV Gerasdorf Stammersdorf,OPEN,15.99,3.05,27.12,58.96017699115043,26.17329174524879,LOSER,-1,0.45609756097560983
2022-08-24 15:00:00,UY,Uruguayan Segunda Division,31684267,Villa Espanola v Sud America,1.202440560,Match Odds,58805,The Draw,OPEN,458.35,3.5,651.11,70.39517132281796,41.82374275138939,LOSER,-1,0.37400000000000005
2022-08-24 15:00:00,UY,Uruguayan Segunda Division,31684266,Miramar Misiones v Central Espanol,1.202440654,Match Odds,5300627,Miramar Misiones,OPEN,642.05,2.1,1075.66,59.68893516538684,12.069887546339224,LOSER,-1,0.85
2022-08-24 15:00:00,UY,Uruguayan Segunda Division,31684266,Miramar Misiones v Central Espanol,1.202440654,Match Odds,5330852,Central Espanol,OPEN,356.65,4.5,1075.66,33.1563877061525,10.93416548393028,LOSER,-1,0.2671428571428571
2022-08-24 15:00:00,BR,Brazilian U20,31688169,Santa Cruz FC U20 v Retro U20,1.202492350,Match Odds,12819986,Santa Cruz FC U20,OPEN,74.82,2.18,78.72999999999999,95.03365934205512,49.16209970902761,LOSER,-1,0.7923728813559321
2022-08-24 15:00:00,CH,Swiss 1 Liga Promotion,31684279,Etoile Carouge v SC Bruhl St Gallen,1.202455897,Match Odds,544529,Etoile Carouge,OPEN,108.24,1.98,169.58999999999995,63.82451795506812,13.319467450017612,WINNER,0.9163,-1
2022-08-24 15:00:00,CH,Swiss 1 Liga Promotion,31684279,Etoile Carouge v SC Bruhl St Gallen,1.202455897,Match Odds,21005121,SC Bruhl St Gallen,OPEN,57.21,3.75,169.58999999999995,33.734300371484174,7.067633704817506,LOSER,-1,0.34

当我想要筛选与第一行具有相同数据的行,而且我知道要使用哪些数据行来筛选DataFrame时,我可以用下列方式找到最终结果:

import pandas as pd

df = pd.read_csv('history_p&l.csv')

def results(a,b):
    filter = df[
        (df['percentage'] == a) & 
        (df['odds'] == b)
    ]
    back_sum = filter['back'].sum()
    print(back_sum)

results(
    df.iloc[0]['percentage'],
    df.iloc[0]['odds']
)

但是,目前我希望发现DataFrame中具有值的列的所有可能组合的结果,在本例中,这些是选项,这就是我如何创建可能组合的列表:

def combs(a):
    if len(a) == 0:
        return [[]]
    cs = []
    for c in combs(a[1:]):
        cs += [c, c+[a[0]]]
    return cs

options = [
    'country',
    'competition',
    'match_id',
    'match_name',
    'market_id',
    'market_name',
    'runner_id',
    'runner_name',
    'total_matched',
    'odds',
    'market_matched',
    'percentage',
    'above_odds'
]

combinations = combs(options)

如何使函数(def results())具有延展性,以便分析每个组合的结果?
也在想我如何能够根据不同的参数数量来调用函数。
注意:我考虑过使用eval()并从字符串创建代码,但我知道这并没有指明,而且这样做并不明智。

r7s23pms

r7s23pms1#

要获取filter,可以根据combinations中的单个combination动态构建 predicate 沿着如下所示

filter = df[reduce(lambda x, y : x & (df[y[0]] == y[1]), zip(combination, map(lambda x: df.iloc[0][x], combination)), True)]

不确定results()的最佳参数是什么,但它取决于用例。例如,results()可以只使用combination而不是a, b来调用,或者使用两个序列来调用:combination和相应的值。

相关问题