基于.net的sparkDataframe递归计算

cwxwcias  于 2021-07-09  发布在  Spark
关注(0)|答案(1)|浏览(494)

我想用.net计算spark的rsi。
rsi公式为:

  1. RSI = 100 - 100/(1 + R)S
  2. RS = Average Gain / Average Loss

第一个平均收益和平均损失是14个时期的平均值:

  1. First Average Gain = Sum of Gains over the past 14 periods / 14.
  2. First Average Loss = Sum of Losses over the past 14 periods / 14

接下来的所有计算均基于先前的平均值和当前的增益损失:

  1. Average Gain = [(previous Average Gain) x 13 + current Gain] / 14.
  2. Average Loss = [(previous Average Loss) x 13 + current Loss] / 14.

数据在 DataFrame rsiCalcPos5 看起来是这样的:

  1. +--------------------+-----+------+----+-----+------------------+-------------------+----------+------------------+--------------------+-------------------+-------------------+------------------+
  2. | TimeSeriesType|Year0|Month0|Day0|Hour0| avg(Value)| Timestamp| UnixTime| nextValue| deltaValue| gain| loss| gain1|
  3. +--------------------+-----+------+----+-----+------------------+-------------------+----------+------------------+--------------------+-------------------+-------------------+------------------+
  4. |Current Available...| 2021| 3| 3| 9| 219.8235294117647|2021-03-03 09:00:00|1614758400|218.59733449857987| -1.2261949131848269| 0.0| 1.2261949131848269| 0.0|
  5. |Current Available...| 2021| 3| 3| 10|218.59733449857987|2021-03-03 10:00:00|1614762000|185.59442632671212| -33.002908171867745| 0.0| 33.002908171867745| 0.0|
  6. |Current Available...| 2021| 3| 3| 11|185.59442632671212|2021-03-03 11:00:00|1614765600| 190.5523781944545| 4.957951867742366| 4.957951867742366| 0.0|1.6526506225807889|
  7. |Current Available...| 2021| 3| 3| 12| 190.5523781944545|2021-03-03 12:00:00|1614769200|187.88173813444055| -2.6706400600139375| 0.0| 2.6706400600139375|1.2394879669355916|
  8. |Current Available...| 2021| 3| 3| 13|187.88173813444055|2021-03-03 13:00:00|1614772800| 187.6245558053521|-0.25718232908846517| 0.0|0.25718232908846517|0.9915903735484732|
  9. |Current Available...| 2021| 3| 3| 14| 187.6245558053521|2021-03-03 14:00:00|1614776400|186.56644553819817| -1.0581102671539213| 0.0| 1.0581102671539213|0.8263253112903944|
  10. |Current Available...| 2021| 3| 3| 15|186.56644553819817|2021-03-03 15:00:00|1614780000|186.66761484852796| 0.10116931032979437|0.10116931032979437| 0.0|0.7227315968674516|
  11. |Current Available...| 2021| 3| 3| 16|186.66761484852796|2021-03-03 16:00:00|1614783600|165.79466929911155| -20.872945549416414| 0.0| 20.872945549416414|0.6323901472590201|
  12. |Current Available...| 2021| 3| 3| 17|165.79466929911155|2021-03-03 17:00:00|1614787200|178.60478239401849| 12.810113094906939| 12.810113094906939| 0.0|1.9854704747754555|
  13. |Current Available...| 2021| 3| 3| 18|178.60478239401849|2021-03-03 18:00:00|1614790800| 215.3916108565386| 36.786828462520106| 36.786828462520106| 0.0| 5.465606273549921|
  14. |Current Available...| 2021| 3| 3| 19| 215.3916108565386|2021-03-03 19:00:00|1614794400|221.27369459516595| 5.882083738627358| 5.882083738627358| 0.0| 5.503467861284233|
  15. |Current Available...| 2021| 3| 3| 20|221.27369459516595|2021-03-03 20:00:00|1614798000|231.88854705635575| 10.614852461189798| 10.614852461189798| 0.0| 5.92941657794303|
  16. |Current Available...| 2021| 3| 3| 21|231.88854705635575|2021-03-03 21:00:00|1614801600|238.82354991634134| 6.9350028599855875| 6.9350028599855875| 0.0| 6.006769368869381|
  17. |Current Available...| 2021| 3| 3| 22|238.82354991634134|2021-03-03 22:00:00|1614805200|240.02948909258865| 1.2059391762473126| 1.2059391762473126| 0.0| 5.663852926539233|
  18. |Current Available...| 2021| 3| 3| 23|240.02948909258865|2021-03-03 23:00:00|1614808800|240.92351533915001| 0.8940262465613671| 0.8940262465613671| 0.0| null|
  19. |Current Available...| 2021| 3| 4| 0|240.92351533915001|2021-03-04 00:00:00|1614812400|239.63160854893138| -1.2919067902186328| 0.0| 1.2919067902186328| null|
  20. |Current Available...| 2021| 3| 4| 1|239.63160854893138|2021-03-04 01:00:00|1614816000|240.48959521094642| 0.8579866620150369| 0.8579866620150369| 0.0| null|
  21. |Current Available...| 2021| 3| 4| 2|240.48959521094642|2021-03-04 02:00:00|1614819600|192.37784787942516| -48.11174733152126| 0.0| 48.11174733152126| null|
  22. |Current Available...| 2021| 3| 4| 3|192.37784787942516|2021-03-04 03:00:00|1614823200|192.96993537510536| 0.5920874956802038| 0.5920874956802038| 0.0| null|
  23. |Current Available...| 2021| 3| 4| 4|192.96993537510536|2021-03-04 04:00:00|1614826800|193.60104726861024| 0.6311118935048796| 0.6311118935048796| 0.0| null|
  24. +--------------------+-----+------+----+-----+------------------+-------------------+----------+------------------+--------------------+-------------------+-------------------+------------------+

我已经计算过了 gain 以及 loss 第一平均增益( gain1 = 5.663852926539233 ,因为计算rsi的时间间隔是14)。
现在我在计算其他平均收益时遇到了问题,从第15行开始。这个公式是递归的,我不知道如何实现它。到目前为止,我尝试了窗口函数,但没有得到正确的结果。

  1. WindowSpec windowRSI3 = Microsoft.Spark.Sql.Expressions.Window
  2. .PartitionBy("TimeSeriesType")
  3. .OrderBy("Year0", "Month0", "Day0", "Hour0");
  4. DataFrame rsiCalcPos6 = rsiCalcPos5.WithColumn("avgGainj", When(Col("gain1").IsNull(),
  5. (Lag(Col("gain1"), 1, 0).Multiply(13 / 14).Minus((Col("gain").Multiply(-1 / 14))
  6. .Over(windowRSI3)))).Otherwise(Col("gain1")));

在这里,我得到一个例外:
org.apache.spark.sql.analysisexception:在窗口函数中不支持表达式'(gain#175*cast(0为double))'。
我要使用的递归公式需要计算一个 avgGainj 并在计算下一个时使用此结果 avgGain(j+1) .
如有任何建议,我们将不胜感激。谢谢!

rdlzhqv9

rdlzhqv91#

我不确定我的公式是否完全正确,但我会这样处理:

  1. using System;
  2. using System.Collections.Generic;
  3. using Microsoft.Spark.Sql;
  4. using Microsoft.Spark.Sql.Expressions;
  5. using Microsoft.Spark.Sql.Types;
  6. namespace StackOverflow
  7. {
  8. class Program
  9. {
  10. static void Main(string[] args)
  11. {
  12. var spark = SparkSession.Builder().GetOrCreate();
  13. var df = spark.CreateDataFrame(new List<GenericRow>()
  14. {
  15. new GenericRow(new object[]{1, 0.0, 1.226}),
  16. new GenericRow(new object[]{2, 0.0, 33.09}),
  17. new GenericRow(new object[]{3, 3.3, 0.0}),
  18. new GenericRow(new object[]{4, 0.0, 2.67}),
  19. new GenericRow(new object[]{5, 0.0, 2.67}),
  20. new GenericRow(new object[]{6, 0.0, 2.67}),
  21. new GenericRow(new object[]{7, 7.7, 0.0}),
  22. new GenericRow(new object[]{8, 0.0, 2.67}),
  23. new GenericRow(new object[]{9, 9.9, 0.0}),
  24. new GenericRow(new object[]{10, 10.1, 0.0}),
  25. new GenericRow(new object[]{11, 11.11, 0.0}),
  26. new GenericRow(new object[]{12, 12.12, 0.0}),
  27. new GenericRow(new object[]{13, 13.13, 0.0}),
  28. new GenericRow(new object[]{14, 14.14, 0.0}),
  29. new GenericRow(new object[]{15, 15.15, 0.0}),
  30. new GenericRow(new object[]{16, 16.16, 0.0}),
  31. new GenericRow(new object[]{17, 17.17, 0.0}),
  32. new GenericRow(new object[]{18, 18.18, 0.0}),
  33. new GenericRow(new object[]{19, 19.19, 0.0})
  34. }, new StructType(new List<StructField>()
  35. {
  36. new StructField("Row", new IntegerType()),
  37. new StructField("Gain", new DoubleType()),
  38. new StructField("Loss", new DoubleType()),
  39. }));
  40. df.Show();
  41. //First use a window of the last 14 rows
  42. var lastFourteenRowsWindow = Window.OrderBy(Functions.Desc("Row")).RowsBetween(0, 14);
  43. //Save the sum of the last fourteen rows
  44. var lastFourteenGains = df.WithColumn("LastFourteenGains", Functions.Sum("Gain").Over(lastFourteenRowsWindow));
  45. //calculate the average of those (there is also an avg function you could use instead of sum/14)
  46. var averageGain =
  47. lastFourteenGains.WithColumn("AverageGain", Functions.Col("LastFourteenGains") / 14);
  48. //create second window that doesn't have the 14 requirement
  49. var rowWindow = Window.OrderBy(Functions.Desc("Row"));
  50. //use the new window to retrieve the previous gain
  51. var previousGains = averageGain.WithColumn("PreviousAverageGain",
  52. Functions.Lead("AverageGain", 1).Over(rowWindow));
  53. //Previous Gain / 13 + (Sum(Last 14 Gains)/14)
  54. var result = previousGains.WithColumn("CurrentAverageGains",
  55. ((Functions.Col("PreviousAverageGain") / 13) + Functions.Col("AverageGain")) / 14);
  56. result.Show();
  57. }
  58. }
  59. }

如果在每个阶段之间执行.show(),则可以验证它是否正确。
预计起飞时间

展开查看全部

相关问题