pandas apply和applymap函数在大型数据集上运行的时间较长

kokeuurv 于 2023-04-04 发布在其他

关注(0)|答案(2)|浏览(140)

我有两个函数应用于 Dataframe

res = df.apply(lambda x:pd.Series(list(x)))  
res = res.applymap(lambda x: x.strip('"') if isinstance(x, str) else x)

{{Update}} Dataframe有近700000行。这需要很多时间来运行。
如何减少运行时间？
样本数据：

A        
 ----------
0 [1,4,3,c] 
1 [t,g,h,j]  
2 [d,g,e,w]  
3 [f,i,j,h] 
4 [m,z,s,e] 
5 [q,f,d,s]

输出：

A         B   C   D  E
-------------------------
0 [1,4,3,c]  1   4   3  c
1 [t,g,h,j]  t   g   h  j
2 [d,g,e,w]  d   g   e  w
3 [f,i,j,h]  f   i   j  h
4 [m,z,s,e]  m   z   s  e
5 [q,f,d,s]  q   f   d  s

这行代码res = df.apply(lambda x:pd.Series(list(x)))从一个列表中获取项目，并如上所示逐个填充到每一列中。大约有38列。

pandas

来源：https://stackoverflow.com/questions/51279903/pandas-apply-and-applymap-functions-are-taking-long-time-to-run-on-large-dataset

2条答案

按热度按时间

8ljdwjyq1#

我认为：

res = df.apply(lambda x:pd.Series(list(x)))

应改为：

df1 = pd.DataFrame(df['A'].values.tolist())
print (df1)
   0  1  2  3
0  1  4  3  c
1  t  g  h  j
2  d  g  e  w
3  f  i  j  h
4  m  z  s  e
5  q  f  d  s

第二，如果不是混合列值-数字与字符串：

cols = res.select_dtypes(object).columns
res[cols] = res[cols].apply(lambda x: x.str.strip('"'))

赞(0）回复(0）举报 2023-04-04

oxcyiej72#

也许回复晚了，但对于像我这样的人谁绊倒在这个主题与相同的问题，它可能仍然值得添加我的发现。
我使用了swifter库。Pandas Dataframe 上的apply函数至少快两倍，它也消耗了更少的RAM：

import pandas as pd
import swifter

# then add .swifter between df and .apply, as in so...
res = df.swifter.apply(lambda x:pd.Series(list(x)))

这就是全部。它对我来说非常好用。它还包括一个在终端的状态栏，这也非常有帮助。
我的解决方案来自：https://towardsdatascience.com/do-you-use-apply-in-pandas-there-is-a-600x-faster-way-d2497facfa66

赞(0）回复(0）举报 2023-04-04

我来回答

pandas apply和applymap函数在大型数据集上运行的时间较长

2条答案

相关问题

热门标签

最新问答