如何在Pandas中根据几个条件对数据框进行子集划分

ql3eal8s  于 2023-09-29  发布在  其他
关注(0)|答案(3)|浏览(80)

我需要将这个 Dataframe 的子集称为:

df

Server     Model    Slot  
  server1    Cisco     1        
  server1    Cisco     2         
  server1    Cisco     3        
  server1    Cisco     4         
  server1    Cisco     8        
  server1    Cisco   Chasis
  server1    Cisco   Chasis
  server2    IBM     Slot 5        
  server2    IBM     Slot 8         
  server2    IBM     Slot 9 
  server3    Micr    Slot 22       
  server3    Micr    Slot 18        
  server3    Micr    Slot 1
  server3    Micr    Chasis 1

子集df将包括小于等于12的Slot值,或者Slot值中包括“Slot”文本。
最终的 Dataframe 需要看起来像这样:

Server     Model    Slot  
  server1    Cisco     1        
  server1    Cisco     2         
  server1    Cisco     3        
  server1    Cisco     4         
  server1    Cisco     8        
  server2    IBM     Slot 5        
  server2    IBM     Slot 8         
  server2    IBM     Slot 9 
  server3    Micr    Slot 22       
  server3    Micr    Slot 18        
  server3    Micr    Slot 1

我试过这个:

df[df['Slot']=<12 || df['Slot].str.contains("Slot")]
pnwntuvh

pnwntuvh1#

假设"Slot"列中有字符串:

mask = df["Slot"].str.contains(r"^(?:\d|1[0-2])$|Slot")
print(df[mask])

图纸:

Server  Model     Slot
0   server1  Cisco        1
1   server1  Cisco        2
2   server1  Cisco        3
3   server1  Cisco        4
4   server1  Cisco        8
7   server2    IBM   Slot 5
8   server2    IBM   Slot 8
9   server2    IBM   Slot 9
10  server3   Micr  Slot 22
11  server3   Micr  Slot 18
12  server3   Micr   Slot 1

如果您有数字/字符串的混合,请将所有转换为字符串:

mask = df["Slot"].astype(str).str.contains(r"^(?:\d|1[0-2])$|Slot")
print(df[mask])
iih3973s

iih3973s2#

对于您的特定问题,使用apply创建遮罩将有效:

threshold =12
mask = df['Slot'].apply(lambda x: int(x) <= threshold if x.isdigit() else 'slot' in x.lower())
df[mask]

输出量:

Server  Model     Slot
0   server1  Cisco        1
1   server1  Cisco        2
2   server1  Cisco        3
3   server1  Cisco        4
4   server1  Cisco        8
7   server2    IBM   Slot 5
8   server2    IBM   Slot 8
9   server2    IBM   Slot 9
10  server3   Micr  Slot 22
11  server3   Micr  Slot 18
12  server3   Micr   Slot 1
cgvd09ve

cgvd09ve3#

我认为一般的代码是这样的。

cond1 = pd.to_numeric(df['Slot'], errors='coerce').le(12)
cond2 = df['Slot'].str.contains('Slot')
df[cond1 | cond2]

相关问题