假设我在dfa中有一些数据,例如,一个键(pid)和一个数组类型列(category\u ids\u array):
val dfA = spark.createDF(
Array(
("10009004", Array("10009004", "10348794", "546313", "546264", "2173952")),
("10086262", Array("10086262", "23009642", "3617058", "2173952"))
), List(
("pid", StringType, true),
("category_ids_array", ArrayType(StringType, true), true)
)
)
dfa公司
+----------+---------------------------------------------------+
|pid |category_ids_array |
+----------+---------------------------------------------------+
|10009004 |[10009004, 10348794, 546313, 546264, 2173952] |
|10086262 |[10086262, 23009642, 3617058, 2173952] |
+----------+---------------------------------------------------+
我还有Dataframeb,看起来像:
+----------+------------+---------------------+
|pid |attribute_id|attribute_value |
+----------+------------+---------------------+
|10086262 |10002948 |Rabbit |
|10086262 |10002950 |Unconjugated |
|10009004 |10670938 |BCS207B |
|10086262 |10670938 |BP215734 |
|10009004 |10671048 |0000011756 |
|10086262 |10671048 |19397 |
|10086262 |10671049 |SCIENCE |
|10009004 |10671049 |SCIENCE, LLC |
|10009004 |10671050 |CRYO BLUE |
|10086262 |10671050 |CBR4 |
|10348794 |606921 |Green and Blue |
|23009642 |606921 |Purple and Yellow |
+----------+------------+---------------------+
我的问题是,如果可能的话,如何遍历dfa上数组类型行中的每个字符串值,并从dfb中提取匹配结果,但按层次顺序展平它们?dfa有一个唯一的PID列表作为“输入”,dfb包含许多相同PID的行,这些行具有不同的属性值/ID,需要根据输入PID进行汇总。这对我来说很困难,因为dfa的输入字符串的每个结果集都必须覆盖(字符串数组的)下一个输入,因为数组字符串是按层次顺序排列的;例如,dfa:10009004的结果集的第1行必须覆盖10348794,以此类推(如果存在)util该行数组的结尾(但仍然保留基于属性\u id的不相同的先前结果)。可以有数百个属性ID。。。我不知道如何处理这个问题,也许是使用zipwith?有Map覆盖吗?有什么想法吗?输出类似于:
+----------+--------+-------------+-----------+----------+--------------+-----------+------------------+
|product_id|10002948|10002950 |10671048 |10670938 |10671049 |10671050 |606921 |
+----------+--------+-------------+-----------+----------+--------------+-----------+------------------+
|10086262 |Rabbit |Unconjugated |19397 |BP215734 |SCIENCE |CBR4 |Purple and Yellow |
|10009004 |[null] |[null] |0000011756 |BCS207B |SCIENCE, LLC |CRYO BLUE |Green and Blue |
+----------+--------+-------------------------+----------+--------------+-----------+------------------+
提前谢谢。
暂无答案!
目前还没有任何答案,快来回答吧!