我对apachespark有一个问题,使用scala。我正在尝试创建一个spark应用程序,它从用户输入打印rdd。输入数据如下:
List("aaaa","aaaa","dfddf","aaaa","aaaa","dfddf","aaaa","aaaa","dfddf","aaaa","aaaa","dfddf","aaaa","aaaa","dfddf")
代码如下:
val wSchemaString = "col1 col2 col3 col4";
val wSchema = StructType(wSchemaString.split(" ").map(fieldName =>StructField(fieldName, StringType, true)));
val wRow = sc.parallelize(wInput.map(_.split(",")));
val wRowRDD = wRow.map(x=>{
val wNum = wSchemaString.split(" ").size;
var out = new Array[String](wNum+1);
for(i <- 0 to wNum)
{
out(i) = x(i);
}
Row(out);
});
wRowRDD.collect.foreach(println);
...
结果如下:
27 [[Ljava.lang.String;@5f1ec010]
28 [[Ljava.lang.String;@5bd38b39]
29 [[Ljava.lang.String;@5d6b1c05]
30 [[Ljava.lang.String;@7ea6404c]
31 [[Ljava.lang.String;@75447fda]
32 [[Ljava.lang.String;@6425fd5b]
33 [[Ljava.lang.String;@7a1c94ba]
34 [[Ljava.lang.String;@6a687df7]
35 [[Ljava.lang.String;@722619b4]
36 [[Ljava.lang.String;@117d1979]
37 [[Ljava.lang.String;@304a45f4]
38 [[Ljava.lang.String;@5c36aef0]
39 [[Ljava.lang.String;@a173ddc]
40 [[Ljava.lang.String;@7bde3bb0]
41 [[Ljava.lang.String;@3b20df58]
42 [[Ljava.lang.String;@981f1f2
但是,我想要的是:
"aaaa","aaaa","dfddf","aaaa"
"aaaa","dfddf","aaaa","aaaa"
"dfddf","aaaa","aaaa","dfddf"
"aaaa","aaaa","dfddf","aaaa"
1条答案
按热度按时间9udxz4iz1#
row是数组[any],因此prinln函数中生成的字符串是由数组的tostring方法生成的字符串。
要获得这些输出,必须执行以下操作:
我希望这些会有用