sales\u shift-wrong是我当前代码的输出,sales\u shift right是所需的输出:
电流(不正确)输出和期望输出的数据和示例
数据:
dateyearmonthweekitemdepartmentstorestates销售修正01/01/2020200111111592tx$149674.59null02/01/20202001111592tx$101260.73$149674.59null03/01/2020200111111592tx$119931.46$101260.73null04/01/20202001111592tx$209863.86$119931.46null05/01/2020200111111592tx$426471.36$209863.86null06/01/202020012111592tx$377,860.85$426471.36$149674.5907/01/202020012111592tx$127632.41$377860.85$101260.7308/01/202020012111592tx$80207.47$127632.41$119931.4609/01/202020012111592tx$62149.50$80207.47$209863.8610/01/202020012111592tx$67399.59$62149.50$426471.3611/01/202020012111592tx$90867.58$67399.5912/01/202020012111592tx$113,211.99$90867.5816/01/202020013111592tx$67096.92$113211.99$377860.8517/01/202020013111592tx$68440.03$67096.92$127632.4118/01/202020013111592tx$91116.59$68440.03$80207.4719/01/202020013111592tx$119986.25$91116.59$62149.5013/01/202020013111592tx$72911.67$119986.25$67399.5914/01/202020013111592tx$87,993.66美元72911.67美元90867.5815/01/202020013111592TX美元69015.89美元87993.66美元113211.99
我认为正确的解决方法( sales_shift_right
)是通过使用一个窗口函数,但是,我还没有找到参数的组合来获得我想要的结果。
partitions = ['store', 'state_prov_cd', 'department', 'item', 'year', 'month', 'week']
date_col = ['week']
w = (
Window
.partitionBy(partitions)
.orderBy(date_col)
)
df_sales_shifted = (
data
.withColumn('sales_shifted', f.lag('sales', 1).over(w))
.sort(partitions)
)
有人能提出更好的方法或发现错误吗?
1条答案
按热度按时间2guxujil1#
您可以再添加一列
row_number
为了便于分区: