pandas 删除对空的字符串使用concat()时的警告

4nkexdtk  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(163)

我的旧代码包含了一些字符串,有些可能是空的。我现在收到了两个关于这一点的警告。
我的目标是有旧的逻辑,但没有任何警告。主要是我需要保留所有的列名没有空行
我写了下面的代码(修复了2个警告中的1个),但它仍然给我FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated.

import io
import pandas as pd

df_list = ['RevisionTime,Data,2019/Q2,2019/Q3,2019/Q4\r\n',
           'RevisionTime,Data,2019/Q3\r\n2019-08-17,10.5,10.5\r\n',
           'RevisionTime,Data,2019/Q3\r\n2019-09-18 08:10:00,51.0,51.0\r\n',
           'RevisionTime,Data,2019/Q3\r\n2019-10-18 08:10:00,111.5,111.5\r\n',
           'RevisionTime,Data,2019/Q3,2019/Q4\r\n2019-11-15 22:31:00,182.0,111.5,70.5\r\n']

# list with dataframes
df_list = [pd.read_csv(io.StringIO(df)) for df in df_list]

# to avoid 'The behaviour of array concatenation with empty entries is deprecated.'
# and to retain all column names
# https://stackoverflow.com/questions/63970182/concat-list-of-dataframes-containing-empty-dataframes
for i, df in enumerate(df_list):
    col_length = len(df.columns)
    template = pd.DataFrame(data=[[pd.NA] * col_length], columns=df.columns)
    df_list[i] = df if not df.empty else template
#

res_df = pd.concat(df_list) # warning here
res_df = res_df.dropna(how='all') # remove empty rows
print(res_df)

字符串
输出框架应该如下所示:

RevisionTime   Data 2019/Q2  2019/Q3  2019/Q4
0           2019-08-17   10.5     NaN     10.5      NaN
0  2019-09-18 08:10:00   51.0     NaN     51.0      NaN
0  2019-10-18 08:10:00  111.5     NaN    111.5      NaN
0  2019-11-15 22:31:00  182.0     NaN    111.5     70.5


基本上,帮助我修复代码以删除警告。

kulphzqa

kulphzqa1#

您可以使用所有可能的列名创建Index,然后重新索引最终的对象框架:

# create `column_names` index with all posible column names
column_names = pd.Index([])
for df in df_list:
    column_names = column_names.union(df.columns)

res_df = pd.concat([df for df in df_list if not df.empty])

# reindex the final dataframe (this adds NaN to missing columns)
res_df = res_df.reindex(column_names, axis=1)
print(res_df)

字符串
印刷品:

2019/Q2  2019/Q3  2019/Q4   Data         RevisionTime
0      NaN     10.5      NaN   10.5           2019-08-17
0      NaN     51.0      NaN   51.0  2019-09-18 08:10:00
0      NaN    111.5      NaN  111.5  2019-10-18 08:10:00
0      NaN    111.5     70.5  182.0  2019-11-15 22:31:00

相关问题