我正在使用scala的spark。2.4.3
我的salesperson数据框是这样的:它总共有54salesperson,我只举了3列的例子
Schema of SalesPerson table.
root
|-- col: struct (nullable = false)
| |-- SalesPerson_1: string (nullable = true)
| |-- SalesPerson_2: string (nullable = true)
| |-- SalesPerson_3: string (nullable = true)
销售人员视图的数据。
SalesPerson_1|SalesPerson_2|SalesPerson_3
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[Customer_1793, Customer_202, Customer_2461]
[Customer_2424, Customer_130, Customer_787]
[Customer_1061, Customer_318, Customer_706]
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
我的salesplace数据框看起来像
Schema of salesplace
root
|-- Place: string (nullable = true)
|-- Customer: string (nullable = true)
Data of salesplace
Place|Customer
Online| Customer_1793
Retail| Customer_1793
Retail| Customer_130
Online| Customer_130
Online| Customer_2461
Retail| Customer_2461
Online| Customer_2461
我试图检查salesperson表中的哪个客户在salesplace表中可用。有两个 additional column shows customer belong to salesperson
以及salesplace表中的客户发生计数
预期产量:
CustomerBelongstoSalesperson|Customer |occurance|
SalesPerson_1 |Customer_1793|2
SalesPerson_2 |Customer_130 |2
SalesPerson_3 |Customer_2461|3
SalesPerson_2 |Customer_202 |0
SalesPerson_1 |Customer_2424|0
SalesPerson_1 |Customer_1061|0
SalesPerson_2 |Customer_318 |0
SalesPerson_3 |Customer_787 |0
代码:
Error:
The number of aliases supplied in the AS clause does not match the number of columns output by the UDTF expected 54 aliases but got Salesperson,Customer ;
在spark中似乎没有什么关键。我不确定是否可以将columnname作为值放入列中。。。。有人能帮我想一想怎么做吗。。。。。。。。谢谢
1条答案
按热度按时间muk1a3rh1#
试试这个-
加载提供的测试数据
取消PIVOT和左连接