在hive中查找十大趋势推文

klh5stk1  于 2021-06-28  发布在  Hive
关注(0)|答案(1)|浏览(438)

我发现在Hive的前10个趋势推特转发计数的基础上,即推特有最高的转发计数将是第一,以此类推。。。。
这是选举表的细节

  1. id bigint from deserializer
  2. created_at string from deserializer
  3. source string from deserializer
  4. favorited boolean from deserializer
  5. retweeted_status struct<text:string,user:struct<screen_name:string,name:string>,retweet_count:int> from deserializer
  6. entities struct<urls:array<struct<expanded_url:string>>,user_mentions:array<struct<screen_name:string,name:string>>,hashtags:array<struct<text:string>>> from deserializer
  7. text string from deserializer
  8. user struct<screen_name:string,name:string,friends_count:int,followers_count:int,statuses_count:int,verified:boolean,utc_offset:int,time_zone:string,location:string> from deserializer
  9. in_reply_to_screen_name string from deserializer

我的问题

  1. select text
  2. from election
  3. where retweeted_status.retweet_count IN
  4. (select retweeted_status.retweet_count as zz
  5. from election
  6. order by zz desc
  7. limit 10);

它给了我10次同样的微博(推送,推送,推送。推特()
所以当我运行内部查询时,我所做的就是打破嵌套查询

  1. select retweeted_status.retweet_count as zz
  2. from election
  3. order by zz desc
  4. limit 10

它返回10个不同的值(12101209120812071206,…1201)
之后当我运行外部查询时

  1. select text
  2. from election
  3. where retweeted_status.retweet_count
  4. IN (1210,1209,1208,1207,1206,....1201 );

结果是相同的10条推文(推文,推文,推文。推特()
我的查询逻辑有什么问题?

sz81bmfz

sz81bmfz1#

不要使用count,你应该使用id,这是因为如果你有100条tweet的计数相同,不管限制10,你将得到100条记录。

  1. select text
  2. from election
  3. where id IN
  4. (select id as zz
  5. from election
  6. order by retweeted_status.retweet_count desc
  7. limit 10);

但仍然不知道为什么你会得到错误的结果。
编辑(在我的评论之后):
如果我的意见是正确的,那么你将有相同的身份证十次。在这种情况下,换成

  1. (select distinct id as zz
  2. from election
  3. order by retweeted_status.retweet_count desc
  4. limit 10);
展开查看全部

相关问题