pandas 列表解析中嵌套的for循环/if语句

dw1jzc5e  于 2023-03-28  发布在  其他
关注(0)|答案(2)|浏览(192)

我有以下 Dataframe :

import pandas as pd
import numpy as np

d1 = {'atom_number': ["12,14,24",  "23", "14,25,46", 20.3 , np.nan,  "15,24"]}
df = pd.DataFrame(data=d1)
df
atom_number
0   12,14,24
1   23
2   14,25,46
3   20.3
4   NaN
5   15,24

如果字符串是字符串,我想拆分字符串值。使用下面的代码,我得到一个AttributeError:

df['atom_number'] = [[int(x) if type(s) == str else np.nan for x in s.split(',')] for s in df.atom_number] 
df = df.dropna(subset = ["atom_number"])

属性错误:“float”对象没有属性“split”
期望输出:

atom_number
0   [12, 14, 24]
1   [23]
2   [14, 25, 46]
3   [15, 24]

我知道我可以在对字符串值使用列表解析之前过滤df,但我想知道如何在列表解析中做到这一点。

mlnl4t2r

mlnl4t2r1#

isinstance测试type系列s的值:

df['atom_number'] = [[int(x) for x in s.split(',')] 
                     if isinstance(s, str) 
                     else np.nan for s in df.atom_number]

您的解决方案:

df['atom_number'] = [[int(x) for x in s.split(',')] 
                     if type(s) == str 
                     else np.nan for s in df.atom_number]

或者对整数使用map

df['atom_number'] = [list(map(int, s.split(',')))
                     if type(s) == str 
                     else np.nan for s in df.atom_number] 

df = df.dropna(subset = ["atom_number"])
print (df)
    atom_number
0  [12, 14, 24]
1          [23]
2  [14, 25, 46]
5      [15, 24]

更复杂的解决方案是通过,移除拆分值中的浮点数,并保留单个整数:

d1 = {'atom_number': ["12,14,24",  "23", "14,25,46.5", 20.3 , np.nan,  "15,24", 20]}
df = pd.DataFrame(data=d1)

df['atom_number'] = [[int(x) for x in s.split(',') if float(x).is_integer()] 
                      if isinstance(s, str) 
                      else [s] 
                      if isinstance(s, int) 
                      else np.nan for s in df.atom_number] 

df = df.dropna(subset = ["atom_number"])
print (df)
    atom_number
0  [12, 14, 24]
1          [23]
2      [14, 25]
5      [15, 24]
6          [20]
4nkexdtk

4nkexdtk2#

pandas有一个.apply方法,它可能比list-comp更适合修改pandas对象,如DataFrames或series,并且看起来更清晰。
isinstance看起来也比比较对象的类型更好。
例如:

import pandas as pd
import numpy as np

data = {'atom_number': ["12,14,24",
                        "23",
                        "14,25,46",
                        20.3,
                        np.nan,
                        "15,24"]}
df = pd.DataFrame(data=data)

def split_atom(x):
    if isinstance(x, str):
        return np.array([int(i) for i in x.split(',')])
    elif isinstance(x, int):
        return np.array([x])

df.atom_number = df.atom_number.apply(split_atom)
df.dropna()

相关问题