pandas读取excel“General”列作为对象

qlzsbp2j 于 2023-05-19 发布在其他

关注(0)|答案(2)|浏览(172)

我有一个.xls文件，看起来像这样

col_a       col_b   col_c   col_d
5376594                     hello
12028432                    world
17735732    hello   12      hello
17736843    world           world

当我阅读文件时
test = pandas.read_excel('F:/test.xls')
使用以下列类型读取表：

>>> test.dtypes
col_a       int64
col_b       object
col_c       float64
col_d       object

我遇到的问题是，我希望有col_b和col_d的字符串列。因为我对Python很陌生，你能告诉我
1.在幕后发生了什么？和
1.是否有任何参数需要调整以将列读取为字符串？
编辑：第一行的类型如注解中所要求

>>> type(test.iloc[0]['col_a'])
<class 'numpy.int64'>
>>> type(test.iloc[0]['col_b'])
<class 'float'>
>>> type(test.iloc[0]['col_c'])
<class 'numpy.float64'>
>>> type(test.iloc[0]['col_d'])
<class 'str'>

excel

来源：https://stackoverflow.com/questions/32458826/pandas-read-excel-general-column-as-object

2条答案

按热度按时间

mf98qq941#

您可以在www.example.com _csv中定义dtypepandas.read。

dtype*：数据类型名称或列名到数据类型的dict。如果未指定，则将推断数据类型。（不支持engine='python'）

为什么NaN是float？
dtypes的类型在这里（在页面的末尾）。
测试：

import pandas
import io
import numpy

col_types = {"col_a": numpy.int32, "col_b": str, "col_c": str, "col_d": str}

temp=u"""col_a,col_b,col_c,col_d
5376594,,,hello
12028432,,,world
17735732,hello,12,hello
17736843,world,,world"""

test = pandas.read_csv(io.StringIO(temp), header=0, sep=",", dtype=col_types)


print type(test.iloc[0]['col_a'])
print type(test.iloc[0]['col_b'])
print type(test.iloc[0]['col_c'])
print type(test.iloc[0]['col_d'])
#
#<type 'numpy.int32'>
#<type 'float'>
#<type 'float'>
#<type 'str'>

print type(test.iloc[2]['col_a'])
print type(test.iloc[2]['col_b'])
print type(test.iloc[2]['col_c'])
print type(test.iloc[2]['col_d']).
#
#<type 'numpy.int32'>
#<type 'str'>
#<type 'str'>
#<type 'str'>

print test
print test.dtypes
#
#col_a     int32
#col_b    object
#col_c    object
#col_d    object
#dtype: object

赞(0）回复(0）举报 2023-05-19

yduiuuwa2#

从pd.read_excel上的pandas文档中了解函数的参数：
dtypeType name或dict of column -> type，默认为None数据或列的数据类型。例如'a'：np.float64，'b'：np.int32}使用object将数据保存为Excel中存储的数据，而不解释dtype。如果指定了转换器，则将应用这些转换器而不是数据类型转换。
你也可以使用转换器，就像在其他答案中建议的那样，但我认为你并不真的需要，因为转换器应用了某种转换，如文档中所示：
convertersdict，default None用于转换某些列中的值的函数的Dict。键可以是整数或列标签，值是接受一个输入参数（Excel单元格内容）并返回转换后的内容的函数。
因此，使用转换器会将例如值为1的像元int更改为值为1.1的像元float
Object是一种通用的数据类型，通常用于字符串，尽管pandas有一个更具体的dtype用于字符串StringDtype。有关详细信息，请参阅文档。
最后，您可以通过以下方式更改read_excel：

test = pandas.read_excel('F:/test.xls',
    dtype={'col_a': int, 'col_b': str,'col_c': float,'col_d': str,})

应该可以虽然它可能会将col_B和col_d保留为对象类型，但如果发生这种情况，您应该尝试：

test = pandas.read_excel('F:/test.xls',
    dtype={'col_a': int, 'col_b': pd.StringDtype,'col_c': float,'col_d': pd.StringDtype,})

这应该返回您想要的str格式。
编辑：查看StringDtype文档，我看到了这个警告，所以要小心
StringDtype被认为是实验性的。API的实现和部分可能会在没有警告的情况下发生更改。
您还可以检查this question以详细查看pandas接受的每种数据类型

赞(0）回复(0）举报 2023-05-19

我来回答

pandas读取excel“General”列作为对象

2条答案

相关问题

热门标签

最新问答