在这里,我正在为spark.sql编写sql语句,但是我无法将weekofyear转换为一年中的week,并且在下面的输出中得到一个null。我已经展示了我使用的表达式
输入数据
InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,01-12-2010 8.26,2.55,17850,United Kingdom
536365,71053,WHITE METAL LANTERN,6,01-12-2010 8.26,3.39,17850,United Kingdom
536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,01-12-2010 8.26,2.75,17850,United Kingdom
536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,01-12-2010 8.26,3.39,17850,United Kingdom
sql代码
val summarySQlTest = spark.sql(
"""
|select Country,WEEKOFYEAR(InvoiceDate)as WeekNumber,
|count(distinct(InvoiceNo)) as NumInvoices,
|sum(Quantity) as TotalQuantity,
|round(sum(Quantity*UnitPrice),2) as InvoiceValue
|from sales
|group by Country,WeekNumber
|""".stripMargin
).show()
期望输出
+--------------+----------+-----------+-------------+------------+
| Country|WeekNumber|NumInvoices|TotalQuantity|InvoiceValue|
+--------------+----------+-----------+-------------+------------+
| Spain| 49| 1| 67| 174.72|
| Germany| 48| 11| 1795| 3309.75|
我得到的输出
+--------------+----------+-----------+-------------+------------+
| Country|WeekNumber|NumInvoices|TotalQuantity|InvoiceValue|
+--------------+----------+-----------+-------------+------------+
| Spain| null| 1| 67| 174.72|
| Germany| null| 11| 1795| 3309.75|
对于所需的输出,我使用了这个,但我想在spark.sql中解决相同的问题
如果有人能解释一下这里到底发生了什么(tou date(col(“invoicedate”),“dd-mm-yyyy h.mm”),那也太好了
val knowFunc= invoicesDF.withColumn("InvoiceDate",to_date(col("InvoiceDate"),"dd-MM-yyyy H.mm"))
.where("year(InvoiceDate) == 2010")
.withColumn("WeekNumber",weekofyear(col("InvoiceDate")))
.groupBy("Country","WeekNumber")
.agg(sum("Quantity").as("TotalQuantity"),
round(sum(expr("Quantity*UnitPrice")),2).as("InvoiceValue")).show()
1条答案
按热度按时间xzv2uavs1#
你需要转换
InvoiceDate
列到日期类型第一(使用to_date
),然后才能打电话weekofyear
. 我想这也回答了你的最后一个问题。