我有一个下面的sparkDataframe,其中所有的列(除了主键列empïid)都由一个Map组成(其中键“from”和“to”可以有空值)。我想计算每一列的'from'和'to'(emp\u id除外),并向Map(名为'change')添加一个新键,该Map的值为a)'insert'if'from'value为null,to'不为null b)'delete'if'to'value为null,from'不为null b)'update'if'from'和'to'不为null&'from'value不同于'to'value
注意:具有空值的列将保持不变。
重要提示:这些列的类型不是map[string,string],而是map[string,any],这意味着该值可以是其他struct对象的值
我们如何在scala中实现这一点。
|emp_id|emp_city |emp_name |emp_phone |emp_sal |emp_site |
|1 |null |[from -> Will, to -> Watson]|null |[from -> 1000, to -> 8000]|[from ->, to -> Seattle] |
|3 |null |[from -> Norman, to -> Nate]|null |[from -> 1000, to -> 8000]|[from -> CherryHill, to -> Newark]|
|4 |[from ->, to -> Iowa]|[from ->, to -> Ian] |[from ->, to -> 1004]|[from ->, to -> 8000] |[from ->, to -> Des Moines] |
预期:
|emp_id|emp_city |emp_name |emp_phone |emp_sal |emp_site |
|1 |null |[from -> Will, to -> Watson, change -> update]|null |[from -> 1000, to -> 8000, change -> update]|[from ->, to -> Seattle, change -> insert] |
|3 |null |[from -> Norman, to -> Nate, change -> update]|null |[from -> 1000, to -> 8000, change -> update]|[from -> CherryHill, to -> Newark, change -> update]|
|4 |[from ->, to -> Iowa, change -> insert]|[from ->, to -> Ian, change -> insert] |[from ->, to -> 1004, change -> insert]|[from ->, to -> 8000, change -> insert] |[from ->, to -> Des Moines, change -> insert] |
1条答案
按热度按时间dojqjjoe1#
您可以通过下面的行Map器函数来实现这一点,请在线查找代码解释