pandas 无法分析pd read_csv中Int64的字符串

uqdfh47h 于 2023-09-29 发布在其他

关注(0)|答案(2)|浏览(160)

Pandas 2.0.0在解析Int64对象时似乎没有正确考虑thousands=','：

import io
pd.read_csv(io.StringIO('''a\n22,922'''), sep='\t', dtype={'a': 'Int64'}, thousands=',')

具体错误为：

Traceback (most recent call last):    
  File pandas/_libs/lib.pyx:2280 in pandas._libs.lib.maybe_convert_numeric    
ValueError: Unable to parse string "22,922"

有没有一种解决方法，不涉及回到不可空的int或转换为float？我已经确认这适用于旧的dtype dtype={'a': 'int'}和dtype={'a': 'float'}。

pandas

来源：https://stackoverflow.com/questions/77180580/unable-to-parse-string-for-int64-in-pd-read-csv

2条答案

按热度按时间

pkmbmrz71#

实际上，即使您没有指定thousands参数，也会触发错误。这是一个开放的问题（* 参见 * GH52594）。在使用C引擎调用read_csv时，您还不能应用nullable dtype。
一个简单的解决方法是使用astype：

(
    pd.read_csv(
        io.StringIO('''a\n22,922'''), sep='\t', thousands=',')
        .astype(dtype={'a': 'Int64'}) # add this line
)
       a
0  22922

赞(0）回复(0）举报 2023-09-29

2skhul332#

默认引擎是c，您希望使用python

pd.read_csv(io.StringIO('''a\n22,922'''), sep='\t', dtype={'a': 'Int64'}, thousands=',', engine='python')

输出

a
0  22922

赞(0）回复(0）举报 2023-09-29

我来回答

pandas 无法分析pd read_csv中Int64的字符串

2条答案

相关问题

热门标签

最新问答