regex 如何用python从混合数字的字符串中只提取字母

aij0ehis 于 2023-03-31 发布在 Python

关注(0)|答案(3)|浏览(157)

我在我的dataframe中有这个表，char列要么只包含字母，要么只包含数字，要么只包含字母和数字的组合。

char     count
123        24
test       25
te123      26
test123    26

我想只提取字母，如果行只有数字，那么我想让它为空。
预期结果将是：

char     count
NaN       24
test      25
te        26
test      26

在python中我怎么能做到这一点呢？
先谢谢你了

regex

来源：https://stackoverflow.com/questions/75894503/how-to-extract-only-letter-from-a-string-mixed-with-numbers-with-python

3条答案

按热度按时间

7uhlpewt1#

您可以使用extract：

df["char"] = df["char"].str.extract("([a-zA-Z]+)", expand=False)

如果有像"te12s3t"这样的间断字符，请使用findall：

df["char"] = df["char"].str.findall("([a-zA-Z]+)").str.join("")

或者简单地使用replace来处理 * 这两种情况 *：

df["char"] = df["char"].replace("\d+", "", regex=True).mask(lambda s: s.eq(""))

或者以@Corralien 的方式，使用isdigit和replace：

df["char"] = df["char"].mask(df["char"].str.isdigit()).str.replace(r"\d+", "", regex=True)

输出：

print(df)

   char  count
0   NaN     24
1  test     25
2    te     26
3  test     26

赞(0）回复(0）举报 2023-03-31

5sxhfpxr2#

你可以使用regex来做这件事。

import pandas as pd
import numpy as np
import re

data = {'char': ['123', 'test', 'te123', 'test123'], 'count': [24, 25, 26, 26]}
df = pd.DataFrame(data)

df['char'] = df['char'].apply(lambda x: re.sub('[^a-zA-Z]+', '', x) if bool(re.search('[a-zA-Z]', x)) else np.nan)

print(df)

这里re.sub('[^a-zA-Z]+', '', x)从字符串中删除所有非字母字符，下一个正则表达式bool(re.search('[a-zA-Z]', x))检查结果字符串是否包含字母，否则将其变为NaN。

赞(0）回复(0）举报 2023-03-31

vi4fp9gy3#

我们可以按如下方式使用str.replace：

df["char"] = df["char"].str.replace(r'\d+', '', regex=True)

赞(0）回复(0）举报 2023-03-31

我来回答

regex 如何用python从混合数字的字符串中只提取字母

3条答案

相关问题

热门标签

最新问答