如何预测mllib中的值

qnyhuwrf  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(398)

嗨,我是新的Sparkmllib。我已经有一个r模型。我正在尝试与Sparkmllib相同的模型。这里是r模型代码。
r代码。

  1. delhi <- read.delim("UItrain.txt", na.strings = "")
  2. delhi$lnprice <- log(delhi$price)
  3. heddel <- lm(lnprice ~ bedrooms+ bathrooms+ area)
  4. deltest <- read.delim("UItest.txt", na.strings = "")
  5. predict (heddel, deltest)

我正在用java在spark mllib中尝试相同的r代码。

  1. SparkConf conf = new SparkConf().setAppName("Linear Regression Example");
  2. JavaSparkContext sc = new JavaSparkContext(conf);
  3. String path = "UItrain.txt";
  4. JavaRDD<String> data = sc.textFile(path);
  5. JavaRDD<LabeledPoint> parsedData = data.map(
  6. new Function<String, LabeledPoint>() {
  7. public LabeledPoint call(String line) {
  8. String[] parts = line.split("\t");
  9. String[] features = parts[1].split("\t");
  10. double[] v = new double[features.length];
  11. for (int i = 0; i < features.length - 1; i++)
  12. v[i] = Double.parseDouble(features[i]);
  13. return new LabeledPoint(Double.parseDouble(parts[0]), Vectors.dense(v));
  14. }
  15. }
  16. );
  17. parsedData.cache();
  18. // Building the model
  19. String input = "UItrain.txt";
  20. int data2 = "UItest.txt";
  21. int numIterations = 100;
  22. final LinearRegressionModel model =
  23. LinearRegressionWithSGD.train(JavaRDD.toRDD(parsedData), data2);
  24. // Evaluate model on training examples and compute training error
  25. JavaRDD<Tuple2<Double, Double>> valuesAndPreds = parsedData.map(
  26. new Function<LabeledPoint, Tuple2<Double, Double>>() {
  27. public Tuple2<Double, Double> call(LabeledPoint point) {
  28. double prediction = model.predict(point.features());
  29. return new Tuple2<Double, Double>(prediction, point.label());
  30. }
  31. }
  32. );
  33. double MSE = new JavaDoubleRDD(valuesAndPreds.map(
  34. new Function<Tuple2<Double, Double>, Object>() {
  35. public Object call(Tuple2<Double, Double> pair) {
  36. return Math.pow(pair._1() - pair._2(), 2.0);
  37. }
  38. }
  39. ).rdd()).mean();
  40. System.out.println("training Mean Squared Error = " + MSE);

我得到错误,而建立模型。任何帮助将不胜感激。

cu6pst1q

cu6pst1q1#

我认为你的错误在这里的数据2中:

  1. final LinearRegressionModelmodel=LinearRegressionWithSGD.train(JavaRDD.toRDD(parsedData), data2)

回归需要迭代次数,而不是接收文本,

  1. int data2 = "UItest.txt";

如果这不是错误,则编辑并打印错误。

相关问题