regex 当字符串满足条件时，从列中移除该字符串

ekqde3dh 于 12个月前发布在其他

关注(0)|答案(3)|浏览(88)

当字符串列包含小写字母时，我想从字符串列中删除该字符串（字符串列可能是NaN或在一行中包含多个字符串）
| 列2|列3| Column3 |
| --|--| ------------ |
| NaN| NaN| NaN |
| NaN| NaN| NaN |
| NaN| NaN| NaN |
| BCSTACK| BCTENSORFLOW| BCTENSORFLOW |
| 溢出|NaN| NaN |
原来的df是看起来像上面
我已经尝试了“str.contains”函数来定位和替换它，当它包含小写字母
由于str函数不能用于NaN值，所以我首先将NaN值替换为字符串'nan'，
然后用正则表达式替换所有的小写字母。既然‘nan’也是一个小写字母，它也应该被替换掉

df['Column1'].fillna('nan',inplace=True)
df['Column2'].fillna('nan',inplace=True)
df['Column3'].fillna('nan',inplace=True)

lowerletterpattern = r'[a-z]*'

mask1 = df['Column1'].str.contains(lowerletterpattern)
df.loc[mask1,'Column1'] = np.nan

mask2 = df['Column2'].str.contains(lowerletterpattern)
df.loc[mask2,'Column2'] = np.nan

mask3 = df['Column3'].str.contains(lowerletterpattern)
df.loc[mask3,'Column3'] = np.nan

字符串
但df返回的全是NaN值
以下是预期结果：
| 列2|列3| Column3 |
| --|--| ------------ |
| NaN| NaN| NaN |
| NaN| NaN| NaN |
| NaN| NaN| NaN |
| BCSTACK| BCTENSORFLOW| BCTENSORFLOW |
| 溢出|NaN| NaN |

regex

来源：https://stackoverflow.com/questions/76802047/remove-a-string-from-the-column-when-it-meets-the-condition

3条答案

按热度按时间

qoefvg9y1#

一个选项，检查[a-z]与str.contains和mask：

out = df.mask(df.apply(lambda s: s.str.contains('[a-z]').fillna(True)))

字符串
或者，使用replace：

out = df.replace('.*[a-z].*', float('nan'), regex=True)

型
输出量：

Column1   Column2       Column3
0               NaN       NaN           NaN
1  BCDE8ENGUGUETNJN       NaN           NaN
2               NaN       NaN           NaN
3               NaN   BCSTACK  BCTENSORFLOW
4               NaN  OVERFLOW           NaN

型

赞(0）回复(0）举报 12个月前

bkkx9g8r2#

另一个解决方案，使用Series.where：

df = df.apply(
    lambda row: row.where(~row.str.contains(r"[a-z]").astype(bool)),
    axis=1,
)
print(df)

字符串
印刷品：

Column1   Column2       Column3
0               NaN       NaN           NaN
1  BCDE8ENGUGUETNJN       NaN           NaN
2               NaN       NaN           NaN
3               NaN   BCSTACK  BCTENSORFLOW
4               NaN  OVERFLOW           NaN

型

赞(0）回复(0）举报 12个月前

mm5n2pyu3#

你可以使用str.isupper来避免正则表达式：

df = df.applymap(lambda x: x if type(x) == str and x.isupper() else np.NaN)

字符串

赞(0）回复(0）举报 12个月前

我来回答

regex 当字符串满足条件时，从列中移除该字符串

3条答案

相关问题

热门标签

最新问答