csv python多处理 Dataframe 行

wrrgggsh  于 2023-03-21  发布在  Python
关注(0)|答案(1)|浏览(137)
def main():     
    df_master = read_bb_csv(file)
                p = Pool(2)
                if len(df_master.index) >= 1:
                    for row in df_master.itertuples(index=True, name='Pandas'):
                         p.map((partial(check_option, arg1=row), df_master))
    
    
    def check_option(row):
       get_price(row)

我正在使用Pandas来读取CSV文件,遍历行并处理信息。给予get_price()需要进行几次http调用,我想使用多进程来一次处理所有行(取决于CPU内核)以加快函数的速度。
我遇到的问题是,我是多进程新手,不知道如何使用p.map((check_option,arg1=row),df_master)处理 Dataframe 中的所有行。不需要将row值返回给函数。只需要允许进程处理行。
谢谢你的帮助。

axr492tv

axr492tv1#

你可以使用下面的python3版本,我在任何地方都用它,它的工作就像一个魅力!还有一个python3包mpire,我发现它真的很有用,用法与python3的多处理包类似。

from multiprocessing import Pool
import pandas as pd

def get_price(idx, row):
    # logic to fetch price
    return idx, price

def main():
    df = pd.read_csv("path to file")
    NUM_OF_WORKERS = 2 
    with Pool(NUM_OF_WORKERS) as pool:
        results = [pool.apply_async(get_price, [idx, row]) for idx, row in df.iterrows()]
        for result in results:
            idx, price = result.get()
            df.loc[idx, 'Price'] = price
    # do whatever you want to do with df, save it to same file.

if __name__ == "__main__":
    # don't forget to call main func as module
    # This is must in windows use multiple processes/threads. It's also a good practice, more info on this page https://docs.python.org/3/library/multiprocessing.html#multiprocessing-programming
    main()

相关问题