如何在Pandas中替换其下一列具有所有nan值的列名

pepwfjgg  于 2022-12-16  发布在  其他
关注(0)|答案(2)|浏览(132)

我有一个文件,其中的数据用不同的空格分隔,列名也有空格。

Type Dec LookupTable               Field Name                Field Len Start Pos
NUM  0   _                         sample data               5         1
NUM  0   _                         sample data               10        6
CHAR 0   _                         sample data               60        16
NUM  0   _                         sample data               3         76
CHAR 0   _                         sample data               60        79
CHAR 0   _                         sample data               60        139
CHAR 0   _                         sample data               60        199
CHAR 0   _                         sample data               60        259
NUM  0   _                         sample data               3         319
CHAR 0   _                         sample data               60        322
CHAR 0   _                         sample data               60        382
NUM  0   _                         sample data               3         442
CHAR 0   _                         sample data               60        445

我是这样阅读这份文件的

df= pd.read_fwf('./temp.txt', colspecs= 'infer')

得到列之间用空格和数值分隔的 Dataframe
我想删除Nan列并将其之前的列名替换为空列名。如何高效地实现这一点?

预期输出:

myzjeezk

myzjeezk1#

import pandas as pd
import numpy as np

aaa = [(df.columns[i - 1], df.columns[i]) for i in range(1, len(df.columns)) if df[df.columns[i]].isna().all()]
bbb = dict(aaa)
df = df.drop(columns=np.array(aaa)[:, 1])
df.rename(columns=bbb, inplace=True)

print(df)

这里,在列表解析aaa中,创建成对的元组(在left上是要重命名的列的名称,在right上是要删除的列的名称),通过条件检查所有值为空的列:

if df[df.columns[i]].isna().all()

dictionary是从aaa创建的。drop删除了选中的列np.array(aaa)[:, 1](要按切片选择数组,我用np.array将其 Package )。

ecbunoof

ecbunoof2#

我从剪贴板读取数据,因此我的DataFrame看起来与您的略有不同,这也意味着,您必须调整代码,这应该不是什么大问题

df.drop(columns=['Start', 'Pos']).rename(columns={'Name': 'Superman', 'Type': 

'Mytype'})  # Very clear code, I think it doesn't need any explanation

df.dropna(axis=1,how='all').rename(columns={'Name': 'Superman', 'Type': 'Mytype'}) # the second way df.dropna(axis=1,how='all') means that it drops all columns with nans, but only if all of the values are nans.

    Type  Dec LookupTable   Field  Name  Field.1  Len  Start  Pos
0    NUM    0           _  sample  data        5    1    NaN  NaN
1    NUM    0           _  sample  data       10    6    NaN  NaN
2   CHAR    0           _  sample  data       60   16    NaN  NaN
3    NUM    0           _  sample  data        3   76    NaN  NaN
4   CHAR    0           _  sample  data       60   79    NaN  NaN
5   CHAR    0           _  sample  data       60  139    NaN  NaN
6   CHAR    0           _  sample  data       60  199    NaN  NaN
7   CHAR    0           _  sample  data       60  259    NaN  NaN
8    NUM    0           _  sample  data        3  319    NaN  NaN
9   CHAR    0           _  sample  data       60  322    NaN  NaN
10  CHAR    0           _  sample  data       60  382    NaN  NaN
11   NUM    0           _  sample  data        3  442    NaN  NaN
12  CHAR    0           _  sample  data       60  445    NaN  NaN

df.drop(columns=['Start', 'Pos']).rename(columns={'Name': 'Superman', 'Type': 'Mytype'})
#df.dropna(axis=1,how='all').rename(columns={'Name': 'Superman', 'Type': 'Mytype'})
Out[30]: 
   Mytype  Dec LookupTable   Field Superman  Field.1  Len
0     NUM    0           _  sample     data        5    1
1     NUM    0           _  sample     data       10    6
2    CHAR    0           _  sample     data       60   16
3     NUM    0           _  sample     data        3   76
4    CHAR    0           _  sample     data       60   79
5    CHAR    0           _  sample     data       60  139
6    CHAR    0           _  sample     data       60  199
7    CHAR    0           _  sample     data       60  259
8     NUM    0           _  sample     data        3  319
9    CHAR    0           _  sample     data       60  322
10   CHAR    0           _  sample     data       60  382
11    NUM    0           _  sample     data        3  442
12   CHAR    0           _  sample     data       60  445

相关问题