在java中使用apachesparkstream从tick数据创建蜡烛数据

xdnvmnnf  于 2021-07-14  发布在  Spark
关注(0)|答案(1)|浏览(321)

我们正在获取Kafka的滴答声数据,我们正在将其流到ApacheSpark。我们需要从流数据中创建蜡烛数据。
我想到的第一个选项是创建dataframe并从中运行sql查询,如

SELECT t1.price AS open,
       m.high,
       m.low,
       t2.price as close,
       open_time
FROM (SELECT MIN(timeInMilliseconds) AS min_time,
             MAX(timeInMilliseconds) AS max_time,
             MIN(price) as low,
             MAX(price) as high,
             FLOOR(timeInMilliseconds/(1000*60)) as open_time
      FROM ticks
      GROUP BY open_time) m
JOIN ticks t1 ON t1.timeInMilliseconds = min_time
JOIN ticks t2 ON t2.timeInMilliseconds = max_time

但我不确定它是否能够获取旧蜱虫的数据
有没有可能用spark库的一些方法来创建类似的呢?

eimct9ow

eimct9ow1#

请看一下活动时间窗口操作https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#window-事件时间上的操作正是您所需要的。这是一个scatch代码

val windowedCounts = tickStream.groupBy(
   window($"timeInMilliseconds", "1 minutes"))
 ).agg(
    first("price").alias("open"),
    min("price").alias("min"),
    max ("price").alias("max"),
    last("price").alias("close"))

相关问题