我对hadoop的实现感到困惑。我已经编写了预测类值的代码
import numpy as np
import pandas as pd
import os
import random
from sklearn import tree
from sklearn.metrics import accuracy_score
os.chdir('/home/PYTHON/')
data=pd.read_csv('wine.csv')
test_score=[]
error1=[]
error2=[]
accuracy=[]
n_fold=10
for i in xrange(n_fold):
train_data = data.sample(frac=0.70,random_state=1)
test_data = data.loc[~data.index.isin(train_data.index)]
tree_model = tree.DecisionTreeClassifier()
predictors = train_data.ix[:,0:13]
train_y = train_data.ix[:,13]
model=tree_model.fit(X = predictors, y = train_y)
test_feat = test_data.ix[:,0:13]
test_y = test_data.ix[:,13]
#Finding the class value of each row and the accuracy
test_preds = model.predict(X=test_feat)
test_score.append(i)
test_score[i] = accuracy_score(test_y, test_preds)
print("Accuracy by acc_score", sum(accuracy)/len(accuracy))
我是python和hadoop的初学者。我不知道如何将这个程序划分为mapper和reducer。我只对3个数据节点使用hadoop-2.7.3。我可以在hadoop集群中实现这个程序来预测类的值并找到准确度吗?如果不是这样的话,Map器和还原器在预测类和发现精度方面会是什么样子?
暂无答案!
目前还没有任何答案,快来回答吧!