numpy ValueError:无法将字符串转换为float:'A1'使用np.loadtxt

snz8szmq  于 2023-05-22  发布在  其他
关注(0)|答案(1)|浏览(233)

我有一个程序需要处理CSV文件。需要将此文件转换为数据集。我正在使用的示例来自流行的python教程iris data set。我正在尝试用一个方法来读取CSV 'A1-md. csv'来替换datasets.load_iris()。
预期:
程序处理CSV文件并加载数据。
实际:

Traceback (most recent call last):
  File ".\example.py", line 38, in <module>
    main()
  File ".\example.py", line 11, in main
    dataset = np.loadtxt(fname = 'A1-dm.csv', delimiter = ',')
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 1134, in loadtxt
    for x in read_data(_loadtxt_chunksize):
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 1061, in read_data
    items = [conv(val) for (conv, val) in zip(converters, vals)]
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 1061, in <listcomp>
    items = [conv(val) for (conv, val) in zip(converters, vals)]
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 768, in floatconv
    return float(x)
ValueError: could not convert string to float: 'A1'

这个实现的代码是

from sklearn import datasets
from sklearn.model_selection import train_test_split
from MDLP import MDLP_Discretizer

def main():

    ######### USE-CASE EXAMPLE #############

    #read dataset
    dataset = np.loadtxt(fname = 'A1-dm.csv', delimiter = ',')
    X, y = dataset['A1'], dataset['Class']
    # feature_names, class_names = dataset['feature_names'], dataset['target_names']
    # numeric_features = np.arange(X.shape[1])  # all fetures in this dataset are numeric. These will be discretized

    # #Split between training and test
    # X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

    # #Initialize discretizer object and fit to training data
    # discretizer = MDLP_Discretizer(features=numeric_features)
    # discretizer.fit(X_train, y_train)
    # X_train_discretized = discretizer.transform(X_train)

    # #apply same discretization to test set
    # X_test_discretized = discretizer.transform(X_test)

    # #Print a slice of original and discretized data
    # print('Original dataset:\n%s' % str(X_train[0:5]))
    # print('Discretized dataset:\n%s' % str(X_train_discretized[0:5]))

    # #see how feature 0 was discretized
    # print('Feature: %s' % feature_names[0])
    # print('Interval cut-points: %s' % str(discretizer._cuts[0]))
    # print('Bin descriptions: %s' % str(discretizer._bin_descriptions[0]))

if __name__ == '__main__':
    main()

CSV文件的示例如下:

A1,A2,A3,Class
2,0.4631338,1.5,3
8,0.7460648,3.0,3
6,0.264391038,2.5,2
5,0.4406713,2.3,1
2,0.410438159,1.5,3
2,0.302901816,1.5,2
6,0.275869396,2.5,3
8,0.084782428,3.0,3
2,0.53226533,1.5,2

我该如何解决这个问题?

k0pti3hp

k0pti3hp1#

CSV文件的第一行是显示文本的标题。你应该跳过这一行,以便操作 string到float 的转换。
请检查这个:numpy loadtxt skip first row

相关问题