kmeans pyspark org.apache.spark.sparkexception:由于阶段失败而中止作业

rseugnpd  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(783)

我想在我的基础上使用k均值(670万行和22个变量), base.dtypes ```
('anonimisation2', 'double'),
('anonimisation3', 'double'),
('anonimisation4', 'double'),
('anonimisation5', 'double'),
('anonimisation6', 'double'),
('anonimisation7', 'double'),
('anonimisation8', 'double'),
('anonimisation9', 'double'),
('anonimisation10', 'double'),
('anonimisation11', 'double'),
('anonimisation12', 'double'),
('anonimisation13', 'double'),
('anonimisation14', 'double'),
('anonimisation15', 'double'),
('anonimisation16', 'double'),
('anonimisation17', 'double'),
('anonimisation18', 'double'),
('anonimisation19', 'double'),
('anonimisation20', 'double'),
('anonimisation21', 'double'),
('anonimisation22', 'double')]

我读到我应该使用这个代码:

def transData(base):
return base.rdd.map(lambda r: [Vectors.dense(r[:-1])]).toDF(['features'])

transformed= transData(base)
transformed.show(5, False)

然后我写了这个:

kmeans = KMeans().setK(2).setSeed(1)
model = kmeans.fit(transformed)

我有个错误:

IllegalArgumentException: 'requirement failed: Column features must be of type equal to one of the following types: [struct<type:tinyint,size:int,indices:array,values:array>, array, array] but was actually of type struct<type:tinyint,size:int,indices:array,values:array>.'

不知道该怎么办?如果你想知道更多的信息,就问谢谢
我试着继续使用python,但我也遇到了一些问题
gfttwv5a

gfttwv5a1#

使用 from pyspark.ml.linalg import Vectors 而不是 from pyspark.mllib.linalg import Vectors

相关问题