pandas 如何将带有$和的货币列转换为数字

rm5edbpk 于 2023-09-29 发布在其他

关注(0)|答案(6)|浏览(169)

我在pandas dataframe中有以下数据：

import pandas as pd

data = {'state': ['California', 'New York', 'Florida', 'Texas'],
        '1st': ['$11,593,820', '$10,861,680', '$7,942,848', '$7,536,817'],
        '2nd': ['$109,264,246', '$45,336,041', '$69,369,589', '$61,830,712'],
        '3rd': ['$8,496,273', '$6,317,300', '$4,697,244', '$5,736,941']}

df = pd.DataFrame(data)

        state          1st           2nd         3rd
0  California  $11,593,820  $109,264,246  $8,496,273
1    New York  $10,861,680   $45,336,041  $6,317,300
2     Florida   $7,942,848   $69,369,589  $4,697,244
3       Texas   $7,536,817   $61,830,712  $5,736,941

我想对三列（1st、2nd、3rd）执行一些简单的分析（例如，sum、groupby），但这三列的数据类型是object（或string）。
因此，我使用以下代码进行数据转换：

df = df.convert_objects(convert_numeric=True)

但是，转换不工作，也许，由于美元符号。有什么建议吗？

pandas

来源：https://stackoverflow.com/questions/32464280/how-to-convert-currency-column-with-and-to-numbers

6条答案

按热度按时间

2cmtqfgy1#

pandas有三个.replace方法：

pandas.Series.replace用于列
pandas.Series.str.replace用于列
pandas.DataFrame.replace用于多个列，并且无需使用.apply

regex=False是默认设置，因此设置为regex=True
df[df.columns[1:]]选择最后三列。
*在python 3.11.4、pandas 2.1.0中测试

# replace values only in selected columns
df[df.columns[1:]] = df[df.columns[1:]].replace('[\$,]', '', regex=True).astype(float)

# replace values in all columns
df = df.replace('[\$,]', '', regex=True).astype(float)

用于清理货币列的其他模式：
'[^.0-9]'：删除小数点以外的所有非数字
'[^.0-9\-]'：删除除小数点和负号以外的所有非数字
'\D'：删除所有非数字，包括小数点和负号，因此适用于仅限正整数的列

赞(0）回复(0）举报 2023-09-29

k5ifujac2#

你可以使用向量化的str方法来替换不需要的字符，然后将类型转换为int：

In [81]:
df[df.columns[1:]] = df[df.columns[1:]].apply(lambda x: x.str.replace('$','')).apply(lambda x: x.str.replace(',','')).astype(np.int64)
df

Out[81]:
            state       1st        2nd      3rd
index                                          
0      California  11593820  109264246  8496273
1        New York  10861680   45336041  6317300
2         Florida   7942848   69369589  4697244
3           Texas   7536817   61830712  5736941

dtype变更现已确认：

In [82]:

df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 4 columns):
state    4 non-null object
1st      4 non-null int64
2nd      4 non-null int64
3rd      4 non-null int64
dtypes: int64(3), object(1)
memory usage: 160.0+ bytes

另一种方式：

In [108]:

df[df.columns[1:]] = df[df.columns[1:]].apply(lambda x: x.str[1:].str.split(',').str.join('')).astype(np.int64)
df
Out[108]:
            state       1st        2nd      3rd
index                                          
0      California  11593820  109264246  8496273
1        New York  10861680   45336041  6317300
2         Florida   7942848   69369589  4697244
3           Texas   7536817   61830712  5736941

赞(0）回复(0）举报 2023-09-29

ztigrdn83#

也可以按如下方式使用locale

import locale
import pandas as pd
locale.setlocale(locale.LC_ALL,'')
df['1st']=df.1st.map(lambda x: locale.atof(x.strip('$')))

注意上面的代码是在Python 3和Windows环境下测试的

赞(0）回复(0）举报 2023-09-29

qzwqbdag4#

要转换为整数，请用途：

carSales["Price"] = carSales["Price"].replace("[$,]", "", regex=True).astype(int)

赞(0）回复(0）举报 2023-09-29

nzk0hqpo5#

你可以使用方法str.replace和正则表达式'\D'来删除所有非数字字符，或者使用'[^-.0-9]'来保留减号、小数点和数字：

for col in df.columns[1:]:
    df[col] = pd.to_numeric(df[col].str.replace('[^-.0-9]', ''))

赞(0）回复(0）举报 2023-09-29

3yhwsihp6#

df['1st'] = df['1st'].str.replace('$', '').str.replace(',', '').str.split('.', expand=True)[0].astype(int)

赞(0）回复(0）举报 2023-09-29

我来回答

pandas 如何将带有$和的货币列转换为数字

6条答案

相关问题

热门标签

最新问答