我想输出作为map返回的两列,其中一列是 nd.array
浮点数。在pyspark中,我无法将其转换为正确的返回类型。
def get_vectors(feature_map):
ids, inputs = zip(*[
(k, v) for d in feature_map for k, v in d.items()
])
#vectors object will be returned by another method, this is just dummy code to simulate the data. It is an nd.array of floating point numbers
vectors = []
for item in inputs :
vectors.append([1.0,2.0,3.0])
vectors = np.array(vectors,float)
return dict(zip(ids, list(vectors)))
gen_vectors_udf = f.udf(get_vectors,t.MapType(t.StringType(),t.ArrayType(t.ArrayType(t.FloatType()))))
当我调用这个自定义项时,出现以下错误。
构造classdict时需要零个参数(对于numpy.core.multiarray.\u)。
有人能帮我理解如何转换 nd.array
到Pypark类型?
另一方面,如果我把 nd.array
在字符串列表中,它似乎工作得非常好:
def get_vectors(feature_map):
ids, inputs = zip(*[
(k, v) for d in feature_map for k, v in d.items()
])
#vectors object will be returned by another method, this is just dummy code to simulate the data. It is an nd.array of floating point numbers
vectors = []
for item in inputs:
vectors.append([1.0,2.0,3.0])
vectors = np.array(vectors,float)
output = [str(k) for k in vectors]
return dict(zip(ids, output ))
gen_vectors_udf = f.udf(get_vectors,t.MapType(t.StringType(),t.StringType()))
暂无答案!
目前还没有任何答案,快来回答吧!