hadoop-rmr2-svm model-将结果“list”类转换为原始类“svm.formula”“svm”

yfjy0ee7  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(343)

我有以下r配置:
操作系统:linux r版本3.0.1(2013-05-16)rmr2版本2.2.1 rhdfs版本1.0.6 hadoop版本1.2.0
如何使用hadoop和rmr2包转换svm模型的结果?所以我可以像平常一样使用构建的模型,比如:

  1. predict(svm1, "new data")

我有以下代码:

  1. # set eviremonet variables
  2. Sys.setenv(HADOOP_CMD="~/Downloads/hadoop-1.2.0/bin/hadoop")
  3. Sys.setenv(HADOOP_HOME="~/Downloads/hadoop-1.2.0/")
  4. # start hadoop
  5. # load librarys
  6. library(rmr2)
  7. library(rhdfs)
  8. library(e1071)
  9. # load sample data
  10. data(iris)
  11. # init hdfs
  12. hdfs.init()
  13. # push data to hdfs
  14. iris.dfs <- to.dfs(iris)
  15. # define map function
  16. iris.map <- function(k, v)
  17. {
  18. svm(v$Species ~ ., data=v)
  19. }
  20. # rum mar job
  21. iris.svm <- mapreduce(input=iris.dfs, map=iris.map)
  22. # get result back
  23. iris.res <- from.dfs(iris.svm)
  24. svm1 <- svm(iris$Species ~ ., data=iris)
  25. class(iris.res)
  26. class(svm1)

两个课程的结果如下:

  1. > class(iris.res)
  2. [1] "list"
  3. > class(svm1)
  4. [1] "svm.formula" "svm"
  5. > str(svm1)
  6. List of 30
  7. $ call : language svm(formula = iris$Species ~ ., data = iris)
  8. $ type : num 0
  9. $ kernel : num 2
  10. $ cost : num 1
  11. $ degree : num 3
  12. $ gamma : num 0,25
  13. $ coef0 : num 0
  14. $ nu : num 0,5
  15. $ epsilon : num 0,1
  16. $ sparse : logi FALSE
  17. $ scaled : logi [1:4] TRUE TRUE TRUE TRUE
  18. $ x.scale :List of 2
  19. ..$ scaled:center: Named num [1:4] 5,84 3,06 3,76 1,20
  20. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
  21. ..$ scaled:scale : Named num [1:4] 0,828 0,436 1,765 0,762
  22. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
  23. $ y.scale : NULL
  24. $ nclasses : int 3
  25. $ levels : chr [1:3] "setosa" "versicolor" "virginica"
  26. $ tot.nSV : int 51
  27. $ nSV : int [1:3] 8 22 21
  28. $ labels : int [1:3] 1 2 3
  29. $ SV : num [1:51, 1:4] -1,743 -1,864 -0,173 -0,535 -1,501 ...
  30. ..- attr(*, "dimnames")=List of 2
  31. .. ..$ : chr [1:51] "9" "14" "16" "21" ...
  32. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
  33. $ index : int [1:51] 9 14 16 21 23 24 26 42 51 53 ...
  34. $ rho : num [1:3] -0,0203 0,1312 -0,0629
  35. $ compprob : logi FALSE
  36. $ probA : NULL
  37. $ probB : NULL
  38. $ sigma : NULL
  39. $ coefs : num [1:51, 1:2] 0,0891 0,0000 0,8652 0,0000 0,0000 ...
  40. $ na.action : NULL
  41. $ fitted : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
  42. ..- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...
  43. $ decision.values: num [1:150, 1:3] 1,20 1,06 1,18 1,11 1,19 ...
  44. ..- attr(*, "dimnames")=List of 2
  45. .. ..$ : chr [1:150] "1" "2" "3" "4" ...
  46. .. ..$ : chr [1:3] "setosa/versicolor" "setosa/virginica" "versicolor/virginica"
  47. $ terms :Classes 'terms', 'formula' length 3 iris$Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width
  48. .. ..- attr(*, "variables")= language list(iris$Species, Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)
  49. .. ..- attr(*, "factors")= int [1:5, 1:4] 0 1 0 0 0 0 0 1 0 0 ...
  50. .. .. ..- attr(*, "dimnames")=List of 2
  51. .. .. .. ..$ : chr [1:5] "iris$Species" "Sepal.Length" "Sepal.Width" "Petal.Length" ...
  52. .. .. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
  53. .. ..- attr(*, "term.labels")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
  54. .. ..- attr(*, "order")= int [1:4] 1 1 1 1
  55. .. ..- attr(*, "intercept")= num 0
  56. .. ..- attr(*, "response")= int 1
  57. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
  58. .. ..- attr(*, "predvars")= language list(iris$Species, Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)
  59. .. ..- attr(*, "dataClasses")= Named chr [1:5] "factor" "numeric" "numeric" "numeric" ...
  60. .. .. ..- attr(*, "names")= chr [1:5] "iris$Species" "Sepal.Length" "Sepal.Width" "Petal.Length" ...
  61. - attr(*, "class")= chr [1:2] "svm.formula" "svm"
  62. > str(iris.res)
  63. List of 2
  64. $ key: NULL
  65. $ val:List of 30
  66. ..$ call : language svm(formula = v$Species ~ ., data = v)
  67. ..$ type : num 0
  68. ..$ kernel : num 2
  69. ..$ cost : num 1
  70. ..$ degree : num 3
  71. ..$ gamma : num 0,25
  72. ..$ coef0 : num 0
  73. ..$ nu : num 0,5
  74. ..$ epsilon : num 0,1
  75. ..$ sparse : logi FALSE
  76. ..$ scaled : logi [1:4] TRUE TRUE TRUE TRUE
  77. ..$ x.scale :List of 2
  78. .. ..$ scaled:center: Named num [1:4] 5,84 3,06 3,76 1,20
  79. .. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
  80. .. ..$ scaled:scale : Named num [1:4] 0,828 0,436 1,765 0,762
  81. .. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
  82. ..$ y.scale : NULL
  83. ..$ nclasses : int 3
  84. ..$ levels : chr [1:3] "setosa" "versicolor" "virginica"
  85. ..$ tot.nSV : int 51
  86. ..$ nSV : int [1:3] 8 22 21
  87. ..$ labels : int [1:3] 1 2 3
  88. ..$ SV : num [1:51, 1:4] -1,743 -1,864 -0,173 -0,535 -1,501 ...
  89. .. ..- attr(*, "dimnames")=List of 2
  90. .. .. ..$ : chr [1:51] "9" "14" "16" "21" ...
  91. .. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
  92. ..$ index : int [1:51] 9 14 16 21 23 24 26 42 51 53 ...
  93. ..$ rho : num [1:3] -0,0203 0,1312 -0,0629
  94. ..$ compprob : logi FALSE
  95. ..$ probA : NULL
  96. ..$ probB : NULL
  97. ..$ sigma : NULL
  98. ..$ coefs : num [1:51, 1:2] 0,0891 0,0000 0,8652 0,0000 0,0000 ...
  99. ..$ na.action : NULL
  100. ..$ fitted : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
  101. .. ..- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...
  102. ..$ decision.values: num [1:150, 1:3] 1,20 1,06 1,18 1,11 1,19 ...
  103. .. ..- attr(*, "dimnames")=List of 2
  104. .. .. ..$ : chr [1:150] "1" "2" "3" "4" ...
  105. .. .. ..$ : chr [1:3] "setosa/versicolor" "setosa/virginica" "versicolor/virginica"
  106. ..$ terms :Classes 'terms', 'formula' length 3 v$Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width
  107. .. .. ..- attr(*, "variables")= language list(v$Species, Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)
  108. .. .. ..- attr(*, "factors")= int [1:5, 1:4] 0 1 0 0 0 0 0 1 0 0 ...
  109. .. .. .. ..- attr(*, "dimnames")=List of 2
  110. .. .. .. .. ..$ : chr [1:5] "v$Species" "Sepal.Length" "Sepal.Width" "Petal.Length" ...
  111. .. .. .. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
  112. .. .. ..- attr(*, "term.labels")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
  113. .. .. ..- attr(*, "order")= int [1:4] 1 1 1 1
  114. .. .. ..- attr(*, "intercept")= num 0
  115. .. .. ..- attr(*, "response")= int 1
  116. .. .. ..- attr(*, ".Environment")=<environment: 0xb639820>
  117. .. .. ..- attr(*, "predvars")= language list(v$Species, Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)
  118. .. .. ..- attr(*, "dataClasses")= Named chr [1:5] "factor" "numeric" "numeric" "numeric" ...
  119. .. .. .. ..- attr(*, "names")= chr [1:5] "v$Species" "Sepal.Length" "Sepal.Width" "Petal.Length" ...

但是如何将结果列表转换为与通常smv调用相同的类呢?

fgw7neuy

fgw7neuy1#

包起来就行了 svm(v$Species ~ ., data=v) 在一个 list 调用map函数,如中所示 list(svm(v$Species ~ ., data=v)) . Map只能返回列表、矩阵、向量和Dataframe。如果您返回了一个模型(显然不是我故意实现的),它将被强制到一个列表中。也许我可以防御性地这样做,每当返回值不是四个支持的值之一时,只要一巴掌 list 但我不想太聪明,做太多的猜测。您的方法的另一个问题是map函数将在数据集的任意子集上被调用(对于较大的数据集),因此您将在输出中得到一个模型列表(在您对输出调用值之后)。所以现在你有多个支持向量机,你做什么,你把它当作一个陷阱?但是map阶段的子集是任意的,它们没有任何统计特性,比如随机化。在我看来,你认为rmr有超能力使svm函数并行分布,但它没有,它只会在不同数据块的集群上并行调用它。在一个小例子中,只有一个块,但这是骗人的。尝试 rmr.options(keyval.length = 3) 看看非常小的块会发生什么(不是为了生产)。另一种方法是在单机上加载的最大样本上构建模型,然后并行运行predict。当然,这在学习阶段没有那么大的可伸缩性,但我知道一些大的初创公司就是这样做的。看看乌里拉塞森(urilaserson)在cloudera博客上的重采样文章,我想它会给你一些好主意。最后,我们有一个专门的论坛,为rmr和谷歌集团的相关软件包,你非常欢迎加入我们的社区。

相关问题