我正在尝试拆分一个从df创建的rdd。不知道为什么会出错。
不是写每个列名,但是sql包含所有列名。所以,sql没有问题。
val df = sql("SELECT col1, col2, col3,... from tableName")
rddF = df.toJavaRDD
rddFtake(1)
res46: Array[org.apache.spark.sql.Row] = Array([2017-02-26,100102-AF,100134402,119855,1004445,0.0000,0.0000,-3.3,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000]
scala> rddF.map(x => x.split(","))
<console>:31: error: missing parameter type
rdd3.map(x => x.split(","))
你知道这个错误吗?我在用 Spark 2.2.0
1条答案
按热度按时间oxcyiej71#
rddF
是an Array of Row
正如你在书中看到的res46: Array[org.apache.spark.sql.Row]
但你不能split
一Row
当你分开琴弦的时候你可以这样做