我有一个输入 Dataframe :
输入数据格式=
+--------------------------+-----------------------------+
| info (String) | chars (Seq[String]) |
+--------------------------+-----------------------------+
|weight=100,height=70 | [weight,height] |
+--------------------------+-----------------------------+
|weight=92,skinCol=white | [weight,skinCol] |
+--------------------------+-----------------------------+
|hairCol=gray,skinCol=white| [hairCol,skinCol] |
+--------------------------+-----------------------------+
如何将此 Dataframe 作为输出?我事先不知道字符列中包含哪些字符串
输出数据格式=
+--------------------------+-----------------------------+-------+-------+-------+-------+
| info (String) | chars (Seq[String]) | weight|height |skinCol|hairCol|
+--------------------------+-----------------------------+-------+-------+-------+-------+
|weight=100,height=70 | [weight,height] | 100 | 70 | null |null |
+--------------------------+-----------------------------+-------+-------+-------+-------+
|weight=92,skinCol=white | [weight,skinCol] | 92 |null |white |null |
+--------------------------+-----------------------------+-------+-------+-------+-------+
|hairCol=gray,skinCol=white| [hairCol,skinCol] |null |null |white |gray |
+--------------------------+-----------------------------+-------+-------+-------+-------+
我还想将以下Seq[String]保存为变量,但不对 Dataframe 使用 .collect() 函数。
val aVariable: Seq[String] = [weight, height, skinCol, hairCol]
1条答案
按热度按时间thtygnil1#
创建另一个以info列的键为中心的 Dataframe ,然后使用id列将其联接回去:
对于第二个问题,您可以在 Dataframe 中获取这些值,如下所示:
但是如果不使用collect,就无法在Seq变量中获取它们,这是不可能。