python-3.x VowpalWabbit上下文强盗模型未按预期收敛

sczxawaw  于 2023-10-21  发布在  Python
关注(0)|答案(1)|浏览(110)

我在模拟一个场景,其中有两个选项(体育/政治),两个转换率(c_0,c_1)。为了决定向客户显示哪个选项,我使用了一个上下文强盗模型。
我已经生成了100个具有固定上下文(user=Tom)的数据点,其格式如下

  • “共享”|上下文用户=Tom\n0:{cost}:0.5|动作选择=运动\n|行动选择=政治'*

需要注意的一点是,成本是随机生成的价值。在这种情况下:

  • P(成本=-1|选择=体育)= 0.6
  • P(成本=-1|选择=政治)= 0.7
  • P(成本=0.2|选择=体育)= 0.4
  • P(成本=0.2|选择=政治)= 0.3

在这个训练数据集中,选项 “体育” 的平均成本是-0.52,而 * 政治 * 的平均成本是-0.65。因此,我希望模型更喜欢选项B(即,在这100个样本上进行训练后,选择A。然而,在训练后,在'shared |Context user=Tom \n|Action choice=sports \n|Action choice=politics '上运行预测,我得到PMF [0.9,0.1]。
这一点令人担忧,原因有很多:
1.我期待相反的输出,其中P(B)> P(A)。
1.模型不仅认为选项A更好,而且非常有信心。我希望概率在40- 60%左右,但它收敛到90%(!!).
我试过调整模型更改参数可以使模型更好地用于特定数据集,但是重新生成数据集很容易产生模型行为如上所述的状态。
该模型正在运行,

  1. import vowpalwabbit as vw
  2. model = vw.Workspace(
  3. "--cb_explore_adf --passes 1000 -l 0.2 --cb_type ips --holdout_off --epsilon 0.2 --cache -k"
  4. )
  5. for sample in total_training:
  6. x = model.parse(
  7. sample,
  8. vw.LabelType.CONTEXTUAL_BANDIT
  9. )
  10. model.learn(x)
  11. model.predict('shared |Context user=Tom \n|Action choice=sports \n|Action choice=politics ')

完整的训练集是:

  1. 'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  2. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  3. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  4. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  5. 'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  6. 'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  7. 'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  8. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  9. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  10. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  11. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  12. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  13. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  14. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  15. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  16. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  17. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  18. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  19. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  20. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  21. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  22. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  23. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  24. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  25. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  26. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  27. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  28. 'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  29. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  30. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  31. 'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  32. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  33. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  34. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  35. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  36. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  37. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  38. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  39. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  40. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  41. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  42. 'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  43. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  44. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  45. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  46. 'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  47. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  48. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  49. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  50. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  51. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  52. 'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  53. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  54. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  55. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  56. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  57. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  58. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  59. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  60. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  61. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  62. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  63. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  64. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  65. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  66. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  67. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  68. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  69. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  70. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  71. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  72. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  73. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  74. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  75. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  76. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  77. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  78. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  79. 'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  80. 'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  81. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  82. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  83. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  84. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  85. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  86. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  87. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  88. 'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  89. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  90. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  91. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  92. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  93. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  94. 'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  95. 'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  96. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  97. 'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  98. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  99. 'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics '
mutmk8jj

mutmk8jj1#

从你的问题中,我可以推断,体育是更好的,成本最低的选择,所以模型会做它应该做的事情。如果有一个错字,它仍然可能是你生成的样本倾向于另一个选项。学习的策略是否反映了两个动作的“样本平均”性能(根据底层参数,这可能与最优策略不同)。

相关问题