mysql查询多条记录

zzlelutf  于 2021-06-25  发布在  Mysql
关注(0)|答案(1)|浏览(492)

我有以下关于tf/idf测量的问题。我的目标是为每个日期获得前10名,如前10名20160401,前10名20160402等,但根据我的查询,我有超过10为每个日期。因为它可以看到我不能得到订单,应该只有一个1,一个2一3,我的意思是每个日期最多10。

  1. date word rowNumber
  2. 20160401 word 1
  3. 20160401 word 2
  4. ..
  5. 20160401 word 10
  6. -------------------
  7. 20160402 word 1
  8. 20160402 word 2
  9. ..
  10. 20160402 word 10

以下是我对tf-idf测量的查询。在newsdetails表中,我有如下列和值

  1. date word wordCount
  2. 20160401 yürürlük 1
  3. 20160401 mülteci 1
  4. 20160401 anlaşma 2
  5. 20160401 pazartesi 1
  6. 20160401 itibaren 1
  7. 20160401 uygula 2
  8. -------------------
  9. SELECT t.newsCreatedTime,t.word,t.tf_idf as tweetRatio ,rn
  10. FROM (
  11. select w.newsCreatedTime,w.word,w.wordCount,t1.topic_sum,w.wordCount/t1.topic_sum as tf,
  12. t2.idf,(w.wordCount/t1.topic_sum)*(t2.idf) as tf_idf,
  13. (@rn := if(@t = w.newsCreatedTime, @rn + 1,
  14. if(@t := w.newsCreatedTime, 1, 1)
  15. )
  16. ) as rn
  17. FROM newsdetails w JOIN
  18. ( select newsCreatedTime, sum(wordCount) as topic_sum
  19. from newsdetails where trim(word)<>'limit' and length(trim(word))>3 and newsCreatedTime>=20160401
  20. group by newsCreatedTime
  21. ) t1
  22. on w.newsCreatedTime=t1.newsCreatedTime and trim(w.word)<>'limit' and length(trim(w.word))>3 and w.newsCreatedTime>=20160401
  23. join (
  24. select w.word, log10(t_cnts.cnts/count(*)) as idf
  25. from newsdetails w,
  26. (select count(distinct newsCreatedTime) as cnts from newsdetails where trim(word)<>'limit' and length(trim(word))>3 and newsCreatedTime>=20160401) t_cnts
  27. group by w.word
  28. ) t2
  29. on w.word=t2.word CROSS JOIN
  30. (SELECT @t := '', @rn := 0) params
  31. ORDER BY tf_idf desc
  32. ) t
  33. WHERE rn <=10

输出为:

  1. Date word ratio rowNumber
  2. 20160401 kadın -0.0005375613007471833 1
  3. 20160401 beyaz -0.00006167450024777626 1
  4. 20160401 alışveriş -0.000015563777946653284 1
  5. 20160401 ösym 0 1
  6. 20160401 yürürlük 0.00004586823255826781 1
  7. 20160401 haber -0.0003304889014365445 2
  8. 20160401 taraf -0.00019041987363584725 2
  9. 20160401 merkezi -0.00013800691483414033 2
  10. 20160401 mülteci -0.000056636107108483694 2
  11. 20160401 perde 0.00004941253963307586 2
  12. 20160401 sınav -0.00017806007150496475 3

但我只想得到每个日期的前10个单词和它们的比例。

bqf10yzr

bqf10yzr1#

类似的东西在sql server中也可以使用-可能需要调整mysql的语法

  1. select * from(
  2. select *, row_number() over(partition by date order by ratio) rownum from (SELECT t.newsCreatedTime,t.word,t.tf_idf as tweetRatio ,rn
  3. FROM (
  4. select w.newsCreatedTime,w.word,w.wordCount,t1.topic_sum,w.wordCount/t1.topic_sum as tf,
  5. t2.idf,(w.wordCount/t1.topic_sum)*(t2.idf) as tf_idf,
  6. (@rn := if(@t = w.newsCreatedTime, @rn + 1,
  7. if(@t := w.newsCreatedTime, 1, 1)
  8. )
  9. ) as rn
  10. FROM newsdetails w JOIN
  11. ( select newsCreatedTime, sum(wordCount) as topic_sum
  12. from newsdetails where trim(word)<>'limit' and length(trim(word))>3 and newsCreatedTime>=20160401
  13. group by newsCreatedTime
  14. ) t1
  15. on w.newsCreatedTime=t1.newsCreatedTime and trim(w.word)<>'limit' and length(trim(w.word))>3 and w.newsCreatedTime>=20160401
  16. join (
  17. select w.word, log10(t_cnts.cnts/count(*)) as idf
  18. from newsdetails w,
  19. (select count(distinct newsCreatedTime) as cnts from newsdetails where trim(word)<>'limit' and length(trim(word))>3 and newsCreatedTime>=20160401) t_cnts
  20. group by w.word
  21. ) t2
  22. on w.word=t2.word CROSS JOIN
  23. (SELECT @t := '', @rn := 0) params
  24. ORDER BY tf_idf desc
  25. ) t
  26. WHERE rn <=10 )a)b where rownum<=10
展开查看全部

相关问题