多个列和行的配置单元/sql计数出现次数

li9yvcax 于 2021-05-29 发布在 Hadoop

关注(0)|答案(3)|浏览(465)

我在找一个聪明的方法来统计事件。
举个例子：

UserID     CityID    CountryID   TagID
 100000      1         30        5
 100001      1         30        6
 100000      2         20        7
 100000      2         40        8
 100001      1         40        6
 100002      1         40        5
 100002      1         20        6

我想做的是：
我要按列和为每个用户计算值的出现次数。最后，我想要一个表格，其中显示了多少用户有超过不同的特点。
结果应该是这样的-或多或少

Different_CityID    Different_CountryIDs   Different_TagIDs
1                   3                      2

说明：
不同的cityid:just userid 100000有不同的cityid
不同的国家ID：所有用户的国家都有不同的ID
不同的标签号：userid100000和100002都有不同的标签号。用户100001只有“6”作为tagid。
我挣扎着为列和分组计数，但最终没有成功。有聪明的解决办法吗？
谢谢

sql hadoop Hive hiveql

来源：https://stackoverflow.com/questions/43628806/hive-sql-count-occurrences-for-multiple-columns-and-rows

3条答案

按热度按时间

omjgkv6w1#

select  count(case when pos=0 and count_distinct_ID>1 then 1 end) as different_cityid
       ,count(case when pos=1 and count_distinct_ID>1 then 1 end) as different_countryid
       ,count(case when pos=2 and count_distinct_ID>1 then 1 end) as different_tagid
from   (select      pe.pos
                   ,count (distinct pe.ID) as count_distinct_ID
        from        mytable t
                    lateral view posexplode (array(CityID,CountryID,TagID)) pe as pos,ID
        group by    t.UserID
                   ,pe.pos        
        ) t          
;

+------------------+---------------------+-----------------+
| different_cityid | different_countryid | different_tagid |
+------------------+---------------------+-----------------+
|                1 |                   3 |               2 |
+------------------+---------------------+-----------------+

这是另一个避免 count(distinct ...) ```
select count (case when pos=0 and not is_distinct_ID then 1 end) as different_cityid
,count (case when pos=1 and not is_distinct_ID then 1 end) as different_countryid
,count (case when pos=2 and not is_distinct_ID then 1 end) as different_tagid

from (select pe.pos
,min(pe.ID)<=>max(pe.ID) as is_distinct_ID
from mytable t
lateral view posexplode (array(CityID,CountryID,TagID)) pe as pos,ID

    group by    t.UserID
               ,pe.pos        
    ) t

;

…还有另一个变种

select count (case when not is_distinct_CityID then 1 end) as different_cityid
,count (case when not is_distinct_CountryID then 1 end) as different_countryid
,count (case when not is_distinct_TagID then 1 end) as different_tagid

from (select min (CityID) <=> max (CityID) as is_distinct_CityID
,min (CountryID) <=> max (CountryID) as is_distinct_CountryID
,min (TagID) <=> max (TagID) as is_distinct_TagID

    from        mytable
    group by    UserID     
    ) t

;

展开查看全部

赞(0）回复(0）举报 2021-05-29

h4cxqtbf2#

select uid,cid,count(c),count(g) from(select cid,uid,count(coid) over(partition by cid,uid) as c,count(tagid) over(partition by cid,tagid) as g from citydata)e group by cid,uid; 这里uid=userid，cid=cityid，coid=countryid，tagid Total MapReduce CPU Time Spent: 0 msec OK uid cid coid tagid 100000 1 1 1 100001 1 2 2 100002 1 2 2 100000 2 2 2 Time taken: 3.865 seconds, Fetched: 4 row(s) 基于 userid 我希望这会有帮助

赞(0）回复(0）举报 2021-05-29

zvokhttg3#

使用下面的代码我想它可以帮助你，

SELECT COUNT(DISTINCT (CountryID)) AS CountryID,
COUNT(DISTINCT(CityID)) AS CityID,
COUNT(DISTINCT(TagID)) AS TagID
FROM test GROUP BY UserID

结果是这样的，

CountryID   CityID  TagID
2   3   3
1   2   1
1   2   2

你好，维努

赞(0）回复(0）举报 2021-05-29

我来回答

多个列和行的配置单元/sql计数出现次数

3条答案

相关问题

热门标签

最新问答