在java中使用apachesparkstream从tick数据创建蜡烛数据

xdnvmnnf  于 2021-07-14  发布在  Spark
关注(0)|答案(1)|浏览(362)

我们正在获取Kafka的滴答声数据,我们正在将其流到ApacheSpark。我们需要从流数据中创建蜡烛数据。
我想到的第一个选项是创建dataframe并从中运行sql查询,如

  1. SELECT t1.price AS open,
  2. m.high,
  3. m.low,
  4. t2.price as close,
  5. open_time
  6. FROM (SELECT MIN(timeInMilliseconds) AS min_time,
  7. MAX(timeInMilliseconds) AS max_time,
  8. MIN(price) as low,
  9. MAX(price) as high,
  10. FLOOR(timeInMilliseconds/(1000*60)) as open_time
  11. FROM ticks
  12. GROUP BY open_time) m
  13. JOIN ticks t1 ON t1.timeInMilliseconds = min_time
  14. JOIN ticks t2 ON t2.timeInMilliseconds = max_time

但我不确定它是否能够获取旧蜱虫的数据
有没有可能用spark库的一些方法来创建类似的呢?

eimct9ow

eimct9ow1#

请看一下活动时间窗口操作https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#window-事件时间上的操作正是您所需要的。这是一个scatch代码

  1. val windowedCounts = tickStream.groupBy(
  2. window($"timeInMilliseconds", "1 minutes"))
  3. ).agg(
  4. first("price").alias("open"),
  5. min("price").alias("min"),
  6. max ("price").alias("max"),
  7. last("price").alias("close"))

相关问题