pyspark：将月份级别的记录转换为基于单个列的季度级别记录

yyhrrdl8 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(363)

我有一个Dataframe，格式如下：

+---------+-------+------------+------------------+--------+-------+         
| quarter | month | month_rank | unique_customers | units  | sales |
+---------+-------+------------+------------------+--------+-------+
-
|   1     |   1   |     1      |  15              | 30     | 1000  |
--------------------------------------------------------------------
|   1     |   2   |     2      |  20              | 35     | 1200  |
--------------------------------------------------------------------
|   1     |   3   |     3      |  18              | 40     | 1500  |
--------------------------------------------------------------------
|   2     |   4   |     1      |  10              | 25     | 800   |
--------------------------------------------------------------------
|   2     |   5   |     2      |  25              | 50     | 2000  |
--------------------------------------------------------------------
|   2     |   6   |     3      |  28              | 45     | 1800  |
...

我正在尝试按季度分组，并以列式方式跟踪月度销售额，如下所示：

+---------+--------------+------------+------------------+--------+-------+
| quarter | month_rank1 | rank1_unique_customers | rank1_units | rank1_sales | month_rank2 | rank2_unique_customers | rank2_units | rank2_sales | month_rank3 | rank3_unique_customers | rank3_units | rank3_sales |
+---------+--------------+------------+------------------+--------+-------+
|   1     |   1   |  15|30|1000|   2  |20|35|1200  |   3  |18|40|1500
---------------------------------------------------------------------
|   2     |   4   |  10|25|800 |   5  |25|50|2000  |   6  |28|45|1800
---------------------------------------------------------------------

这是否可以通过多个枢轴实现？我没有从一个轴创建多个列的运气。我在想，我也许可以通过开窗来达到这个效果，但如果有人遇到类似的问题，任何建议都会得到极大的赞赏。谢谢您！

apache-spark pyspark

来源：https://stackoverflow.com/questions/61712840/pyspark-pivot-month-level-records-into-a-single-column-based-quarter-level-reco

1条答案

按热度按时间

n7taea2i1#

在上使用轴 month_rank 列然后 agg 其他列。 Example: ```
df=spark.createDataFrame([(1,1,1,15,30,1000),(1,2,2,20,35,1200),(1,3,3,18,40,1500),(2,4,1,10,25,800),(2,5,2,25,50,2000),(2,6,3,28,45,1800)],["quarter","month","month_rank","unique_customers","units","sales"])

df.show()

+-------+-----+----------+----------------+-----+-----+

|quarter|month|month_rank|unique_customers|units|sales|

+-------+-----+----------+----------------+-----+-----+

| 1| 1| 1| 15| 30| 1000|

| 1| 2| 2| 20| 35| 1200|

| 1| 3| 3| 18| 40| 1500|

| 2| 4| 1| 10| 25| 800|

| 2| 5| 2| 25| 50| 2000|

| 2| 6| 3| 28| 45| 1800|

+-------+-----+----------+----------------+-----+-----+

from pyspark.sql.functions import *
df1=df.
groupBy("quarter").
pivot("month_rank").
agg(first(col("month")),first(col("unique_customers")),first(col("units")),first(col("sales")))

cols=["quarter","month_rank1","rank1_unique_customers","rank1_units","rank1_sales","month_rank2","rank2_unique_customers","rank2_units","rank2_sales","month_rank3","rank3_unique_customers","rank3_units","rank3_sales"]

df1.toDF(*cols).show()

+-------+-----------+----------------------+-----------+-----------+-----------+----------------------+-----------+-----------+-----------+----------------------+-----------+-----------+

|quarter|month_rank1|rank1_unique_customers|rank1_units|rank1_sales|month_rank2|rank2_unique_customers|rank2_units|rank2_sales|month_rank3|rank3_unique_customers|rank3_units|rank3_sales|

+-------+-----------+----------------------+-----------+-----------+-----------+----------------------+-----------+-----------+-----------+----------------------+-----------+-----------+

| 1| 1| 15| 30| 1000| 2| 20| 35| 1200| 3| 18| 40| 1500|

| 2| 4| 10| 25| 800| 5| 25| 50| 2000| 6| 28| 45| 1800|

+-------+-----------+----------------------+-----------+-----------+-----------+----------------------+-----------+-----------+-----------+----------------------+-----------+-----------+

赞(0）回复(0）举报 2021-05-27

我来回答

pyspark：将月份级别的记录转换为基于单个列的季度级别记录

1条答案

+-------+-----+----------+----------------+-----+-----+

|quarter|month|month_rank|unique_customers|units|sales|

+-------+-----+----------+----------------+-----+-----+

| 1| 1| 1| 15| 30| 1000|

| 1| 2| 2| 20| 35| 1200|

| 1| 3| 3| 18| 40| 1500|

| 2| 4| 1| 10| 25| 800|

| 2| 5| 2| 25| 50| 2000|

| 2| 6| 3| 28| 45| 1800|

+-------+-----+----------+----------------+-----+-----+

+-------+-----------+----------------------+-----------+-----------+-----------+----------------------+-----------+-----------+-----------+----------------------+-----------+-----------+

|quarter|month_rank1|rank1_unique_customers|rank1_units|rank1_sales|month_rank2|rank2_unique_customers|rank2_units|rank2_sales|month_rank3|rank3_unique_customers|rank3_units|rank3_sales|

+-------+-----------+----------------------+-----------+-----------+-----------+----------------------+-----------+-----------+-----------+----------------------+-----------+-----------+

| 1| 1| 15| 30| 1000| 2| 20| 35| 1200| 3| 18| 40| 1500|

| 2| 4| 10| 25| 800| 5| 25| 50| 2000| 6| 28| 45| 1800|

+-------+-----------+----------------------+-----------+-----------+-----------+----------------------+-----------+-----------+-----------+----------------------+-----------+-----------+

相关问题

热门标签

最新问答