R语言 使用时间序列快速连接数据表

li9yvcax  于 2023-02-17  发布在  其他
关注(0)|答案(1)|浏览(118)

我有两个数据集,其中包含两个不同时间段(1小时和5分钟)的货币数据价格(高、低、开盘和收盘)。
我为1小时时间框架数据的每一行都设定了价格目标,以及交易的目标方向(1或-1)。
对于方向为“1”的交易,我试图找到5分钟图表上“低”〈=目标水平的第一个点;相反,对于方向为“-1”的交易,我希望找到5分钟图表上“高”〉=目标水平的第一个点。
我在下面放了代码来演示我所追求的东西。
我在实践中遇到的问题是,我正在寻找跨越10 - 20年的时间段,这使得连接非常缓慢。我已经设置了下面的例子是从2016年到2022年,所以它在我的电脑上不太慢,但如果你延长到10年以上,它开始真正成为一个痛苦。
这可能是我不得不忍受的事情,但我正在寻找两件事的指导:
1.有没有更快/更有效的方法来实现我下面概述的目标?
1.你可以看到我在结尾处把买卖交易分成了两个连接。有没有办法把它们合并成一个连接?这并不重要,因为它不会占用我代码中太多的“空间”,但我对教学目的很感兴趣。
先谢谢你了菲尔

# Load required packages
library(data.table)
library(dplyr)

# Define timeframes for the data
Start <- as.POSIXct("2016-01-01 00:00:00")
End <- as.POSIXct("2022-01-01 23:55:00")

Hours <- floor(as.numeric(difftime(End,Start,units = "hours"))) + 1
Minutes <- floor(as.numeric(difftime(End,Start,units = "mins")) / 5) + 1

# Create the Hourly data
set.seed(123)
hourly_prices <- data.table(
  datetime = seq(Start, End, by = "hour"),
  open = rnorm(Hours, mean = 100, sd = 1),
  high = rnorm(Hours, mean = 101, sd = 1),
  low = rnorm(Hours, mean = 99, sd = 1),
  close = rnorm(Hours, mean = 100, sd = 1),
  Direction = sample(c(1,-1),Hours,replace = T)) %>%
  .[,Target_price := ifelse(Direction == -1,rnorm(.N, mean = 104, sd = 1),rnorm(.N,mean = 97,sd = 1))]

# Create the 5-minute data
set.seed(456)
minute_prices <- data.table(
  datetime = seq(Start, End, by = "5 min"),
  open = rnorm(Minutes, mean = 100, sd = 1),
  high = rnorm(Minutes, mean = 101, sd = 1),
  low = rnorm(Minutes, mean = 99, sd = 1),
  close = rnorm(Minutes, mean = 100, sd = 1),
  Position = seq_len(Minutes))

# Join the two data.tables to find the first point at which price passes the target levels
hourly_prices[(Direction == 1),Location := minute_prices[.SD, on = .(datetime > datetime, low <= Target_price),mult = "first",x.Position]]
hourly_prices[(Direction == -1),Location := minute_prices[.SD, on = .(datetime > datetime, high >= Target_price),mult = "first",x.Position]]
w3nuxt5m

w3nuxt5m1#

一个简单的大约3倍的加速是将minute_pricesdata.table子集化,使其只包含当前小时(减去1秒,因为连接是datetime > datetime)的累积最小值/最大值行:

dtM <- copy(minute_prices)
dtH <- copy(hourly_prices)

system.time({
  dtH[(Direction == 1),Location := dtM[dtM[, low == cummin(low), as.integer(datetime - 1)%/%3600L][[2]]][.SD, on = .(datetime > datetime, low <= Target_price),mult = "first",x.Position]]
  dtH[(Direction == -1),Location := dtM[dtM[, high == cummax(high), as.integer(datetime - 1)%/%3600L][[2]]][.SD, on = .(datetime > datetime, high >= Target_price),mult = "first",x.Position]]
})
#>    user  system elapsed 
#>    6.77    0.01    6.81

system.time({
  hourly_prices[(Direction == 1),Location := minute_prices[.SD, on = .(datetime > datetime, low <= Target_price),mult = "first",x.Position]]
  hourly_prices[(Direction == -1),Location := minute_prices[.SD, on = .(datetime > datetime, high >= Target_price),mult = "first",x.Position]]
})
#>    user  system elapsed 
#>   22.97    0.00   23.04

identical(dtH, hourly_prices)
#> [1] TRUE

相关问题