Pandas read_csv()-转换器/数据类型的不区分大小写的列名

ymdaylpp 于 2023-06-20 发布在其他

关注(0)|答案(1)|浏览(120)

我正在使用pd.read_csv()加载可能有未知大小写的列名的文件。使用lambda for usecols参数，如here所述，我可以选择加载哪些列，而不管情况如何，并且使用来自here的方法，我可以像这样访问这些列：

df = pd.read_csv(myfile, usecols=lambda x: x.lower() in ['foo', 'bar'])
df.columns = df.columns.str.lower()

print(df['foo'])  # Works no matter which column name case is in the file

但是在这种情况下，是否有办法使用dtypes/converters参数？
我有两个变通的想法：
1.将所有数据作为字符串加载，并在稍后的代码中进行转换。这看起来不太好用。
1.打开文件只是为了读取头文件，分析它，然后在了解列名实际情况的情况下再次打开文件（将其 Package 为函数）。
还有其他方法吗？

pandas

来源：https://stackoverflow.com/questions/76466236/pandas-read-csv-case-insensitive-column-names-for-converters-dtypes

1条答案

按热度按时间

iyr7buue1#

您可以只读入第一行数据来抓取列。我建议通过内置的csv模块，甚至使用pandas来实现这一点，因为它们都可以轻松处理报价。
使用pandas，你可以做这样的事情：

from io import StringIO
from pandas import read_csv

data = StringIO('''
a,B,c,D
1,2,3,4
5,6,7,7
'''.strip())

# expected lower-case column names mapped to dtypes
# intentionally left out column 'D'
dtypes = {'a': 'Int64', 'b': 'int32', 'c': 'float64'}

# `nrows=0` will only read in the column names and an empty DataFrame
columns = read_csv(data, nrows=0).columns

# only need to do this since `data` acts as an open file-handle
data.seek(0)

# create mapping of actual column names → dtype based on a matching `.casefold()`
#  if you're unfamiliar with `string.casefold()` you can think of it like `string.lower()`
new_dtypes = {
    col: dtypes[col.casefold()]
    for col in columns if col.casefold() in dtypes
}
df = read_csv(data, usecols=new_dtypes.keys(), dtype=new_dtypes)

print(df)
   a  B    c
0  1  2  3.0
1  5  6  7.0

赞(0）回复(0）举报 2023-06-20

我来回答

Pandas read_csv()-转换器/数据类型的不区分大小写的列名

1条答案

相关问题

热门标签

最新问答