我们正在获取Kafka的滴答声数据,我们正在将其流到ApacheSpark。我们需要从流数据中创建蜡烛数据。
我想到的第一个选项是创建dataframe并从中运行sql查询,如
SELECT t1.price AS open,
m.high,
m.low,
t2.price as close,
open_time
FROM (SELECT MIN(timeInMilliseconds) AS min_time,
MAX(timeInMilliseconds) AS max_time,
MIN(price) as low,
MAX(price) as high,
FLOOR(timeInMilliseconds/(1000*60)) as open_time
FROM ticks
GROUP BY open_time) m
JOIN ticks t1 ON t1.timeInMilliseconds = min_time
JOIN ticks t2 ON t2.timeInMilliseconds = max_time
但我不确定它是否能够获取旧蜱虫的数据
有没有可能用spark库的一些方法来创建类似的呢?
1条答案
按热度按时间eimct9ow1#
请看一下活动时间窗口操作https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#window-事件时间上的操作正是您所需要的。这是一个scatch代码