从udf返回列表列表:构造classdict需要零参数(对于numpy.core.multiarray.\u)

3npbholx  于 2021-05-27  发布在  Spark
关注(0)|答案(0)|浏览(240)

我想输出作为map返回的两列,其中一列是 nd.array 浮点数。在pyspark中,我无法将其转换为正确的返回类型。

def get_vectors(feature_map):
    ids, inputs = zip(*[
        (k,  v) for d in feature_map for k, v in d.items()
    ])

   #vectors object will be returned by another method, this is just dummy code to simulate the data. It is an nd.array of floating point numbers
    vectors = []
    for item in inputs :
      vectors.append([1.0,2.0,3.0])
    vectors = np.array(vectors,float)
    return dict(zip(ids, list(vectors)))

gen_vectors_udf  = f.udf(get_vectors,t.MapType(t.StringType(),t.ArrayType(t.ArrayType(t.FloatType()))))

当我调用这个自定义项时,出现以下错误。
构造classdict时需要零个参数(对于numpy.core.multiarray.\u)。
有人能帮我理解如何转换 nd.array 到Pypark类型?
另一方面,如果我把 nd.array 在字符串列表中,它似乎工作得非常好:

def get_vectors(feature_map):
    ids, inputs = zip(*[
        (k,  v) for d in feature_map for k, v in d.items()
    ])

    #vectors object will be returned by another method, this is just dummy code to simulate the data. It is an nd.array of floating point numbers
    vectors = []

    for item in inputs:
      vectors.append([1.0,2.0,3.0])
    vectors = np.array(vectors,float)
    output = [str(k) for k in vectors]
    return dict(zip(ids, output ))

gen_vectors_udf = f.udf(get_vectors,t.MapType(t.StringType(),t.StringType()))

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题