我有一个程序需要处理CSV文件。需要将此文件转换为数据集。我正在使用的示例来自流行的python教程iris data set。我正在尝试用一个方法来读取CSV 'A1-md. csv'来替换datasets.load_iris()。
预期:
程序处理CSV文件并加载数据。
实际:
Traceback (most recent call last):
File ".\example.py", line 38, in <module>
main()
File ".\example.py", line 11, in main
dataset = np.loadtxt(fname = 'A1-dm.csv', delimiter = ',')
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 1134, in loadtxt
for x in read_data(_loadtxt_chunksize):
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 1061, in read_data
items = [conv(val) for (conv, val) in zip(converters, vals)]
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 1061, in <listcomp>
items = [conv(val) for (conv, val) in zip(converters, vals)]
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 768, in floatconv
return float(x)
ValueError: could not convert string to float: 'A1'
这个实现的代码是
from sklearn import datasets
from sklearn.model_selection import train_test_split
from MDLP import MDLP_Discretizer
def main():
######### USE-CASE EXAMPLE #############
#read dataset
dataset = np.loadtxt(fname = 'A1-dm.csv', delimiter = ',')
X, y = dataset['A1'], dataset['Class']
# feature_names, class_names = dataset['feature_names'], dataset['target_names']
# numeric_features = np.arange(X.shape[1]) # all fetures in this dataset are numeric. These will be discretized
# #Split between training and test
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# #Initialize discretizer object and fit to training data
# discretizer = MDLP_Discretizer(features=numeric_features)
# discretizer.fit(X_train, y_train)
# X_train_discretized = discretizer.transform(X_train)
# #apply same discretization to test set
# X_test_discretized = discretizer.transform(X_test)
# #Print a slice of original and discretized data
# print('Original dataset:\n%s' % str(X_train[0:5]))
# print('Discretized dataset:\n%s' % str(X_train_discretized[0:5]))
# #see how feature 0 was discretized
# print('Feature: %s' % feature_names[0])
# print('Interval cut-points: %s' % str(discretizer._cuts[0]))
# print('Bin descriptions: %s' % str(discretizer._bin_descriptions[0]))
if __name__ == '__main__':
main()
CSV文件的示例如下:
A1,A2,A3,Class
2,0.4631338,1.5,3
8,0.7460648,3.0,3
6,0.264391038,2.5,2
5,0.4406713,2.3,1
2,0.410438159,1.5,3
2,0.302901816,1.5,2
6,0.275869396,2.5,3
8,0.084782428,3.0,3
2,0.53226533,1.5,2
我该如何解决这个问题?
1条答案
按热度按时间k0pti3hp1#
CSV文件的第一行是显示文本的标题。你应该跳过这一行,以便操作 string到float 的转换。
请检查这个:numpy loadtxt skip first row