我正在创建一个时间间隔列,并将其添加到pyspark中的现有Dataframe中。理想情况下,时间间隔将采用“hhmm”格式,将分钟向下舍入到最接近的15分钟标记(815、830、845、900等)。
我有sparksql代码来为我做逻辑,但是我如何将这个值作为字符串列连接起来并插入到现有的Dataframe中呢?
time_interval = sqlContext.sql("select extract(hour from current_timestamp())||floor(extract(minute from current_timestamp())/15)*15")
time_interval.show()
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|concat(CAST(hour(current_timestamp()) AS STRING), CAST((FLOOR((CAST(minute(current_timestamp()) AS DOUBLE) / CAST(15 AS DOUBLE))) * CAST(15 AS BIGINT)) AS STRING))|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1045|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
baseDF = sqlContext.sql("select * from test_table")
newBase = baseDF.withColumn("time_interval", lit(str(time_interval)))
newBase.select("time_interval").show()
+--------------------+
| time_interval|
+--------------------+
|DataFrame[concat(...|
|DataFrame[concat(...|
|DataFrame[concat(...|
|DataFrame[concat(...|
|DataFrame[concat(...|
|DataFrame[concat(...|
|DataFrame[concat(...|
|DataFrame[concat(...|
|DataFrame[concat(...|
|DataFrame[concat(...|
|DataFrame[concat(...|
|DataFrame[concat(...|
|DataFrame[concat(...|
|DataFrame[concat(...|
|DataFrame[concat(...|
|DataFrame[concat(...|
|DataFrame[concat(...|
|DataFrame[concat(...|
|DataFrame[concat(...|
|DataFrame[concat(...|
+--------------------+
only showing top 20 rows
因此,实际的预期结果应该只是在我创建的新列中显示实际的字符串值,而不是Dataframe中的串联值。如下所示:
newBase.select("time_interval").show(1)
+-------------+
|time_interval|
+-------------+
| 1045 |
+-------------+
1条答案
按热度按时间rpppsulh1#
作为
time_interval
是Dataframe类型,对于这种情况需要collect
以及extract the required value out from dataframe
.请这样做:
(或)
通过使用
select(expr())
功能:正如pault在评论中提到的,使用
selectExpr()
功能:例子: