很抱歉重复的邮件。我正在创建另一个职位,因为这些职位无法解决我的问题。我正在pyspark3.0.1上运行ml回归。我正在640gb内存和32个工作节点的集群上运行它。我有一个33751行63列的数据集。我正在准备ml回归的数据集。所以我写了以下代码
from pyspark.ml.feature import VectorAssembler, StandardScaler
input_col=[...]
vector_assembler=VectorAssembler(inputCols=input_col,outputCol='ss_feature')
temp_train=vector_assembler.transform(train)
standard_scaler=StandardScaler(inputCol='ss_feature',outputCol='scaled')
train=standard_scaler.fit(temp_train).transform(temp_train)
但我在最后一行收到错误信息
org.apache.spark.SparkException: Job aborted due to stage failure: Task 169 in stage 57.0 failed 4
times, most recent failure: Lost task 169.3 in stage 57.0 (TID 5522, 10.8.64.22, executor 11):
org.apache.spark.SparkException: Failed to execute user defined
function(VectorAssembler$$Lambda$6296/1890764576:
你能建议我怎么解决这个问题吗?
暂无答案!
目前还没有任何答案,快来回答吧!