使用group_by函数为每个类别搜索前5个关键字

xcitsw88 于 2023-11-14 发布在其他

关注(0)|答案(1)|浏览(126)

我试图找到前5名的关键字评论的每一类产品，我有以下代码

# Group by category and count keyword frequencies
keyword_counts <- filtered_data %>%
  group_by(category, keyword) %>%
  summarise(n = n()) %>%
  arrange(desc(n))

# Find the top 5 keywords in each category
top_keywords_by_category <- keyword_counts %>%
  group_by(category) %>%
  top_n(5, wt = n) %>%
  ungroup()  # Ungroup the data

# Print the table
print(top_keywords_by_category)

字符串
它提供的输出

category                                                        keyword     n
   <chr>                                                           <chr>   <int>
 1 Computers&Accessories|Accessories&Peripherals|Cables&Accessori… product   354
 2 Computers&Accessories|Accessories&Peripherals|Cables&Accessori… cable     277
 3 Computers&Accessories|Accessories&Peripherals|Cables&Accessori… chargi…   200
 4 Computers&Accessories|Accessories&Peripherals|Cables&Accessori… quality   179
 5 Computers&Accessories|Accessories&Peripherals|Cables&Accessori… nice      147
 6 Electronics|WearableTechnology|SmartWatches                     watch     129
 7 Electronics|Mobiles&Accessories|Smartphones&BasicMobiles|Smart… phone     127
 8 Electronics|HomeTheater,TV&Video|Televisions|SmartTelevisions   tv        117
 9 Electronics|WearableTechnology|SmartWatches                     product   102
10 Electronics|HomeTheater,TV&Video|Televisions|SmartTelevisions   product    80

型
虽然我想要的结果是

Category Computers&Accessories
Keyword             n
1 Product          354
2 Cable            277
3 Chargi...        200
4 Quality          179
5 Nice             147

型

r

来源：https://stackoverflow.com/questions/77431456/displaying-top-5-keywords-for-every-category-using-group-by-function

1条答案

按热度按时间

1sbrub3j1#

虽然这些数据并不有趣，但它应该向您展示如何使用tidyr::separate_rows。

quux <- structure(list(category = c("Computers&Accessories|Accessories&Peripherals|Cables&Accessori…", "Computers&Accessories|Accessories&Peripherals|Cables&Accessori…", "Computers&Accessories|Accessories&Peripherals|Cables&Accessori…", "Computers&Accessories|Accessories&Peripherals|Cables&Accessori…", "Computers&Accessories|Accessories&Peripherals|Cables&Accessori…", "Electronics|WearableTechnology|SmartWatches", "Electronics|Mobiles&Accessories|Smartphones&BasicMobiles|Smart…", "Electronics|HomeTheater,TV&Video|Televisions|SmartTelevisions",  "Electronics|WearableTechnology|SmartWatches", "Electronics|HomeTheater,TV&Video|Televisions|SmartTelevisions"),
                       keyword = c("product", "cable", "chargi…", "quality", "nice", "watch", "phone", "tv", "product", "product"), 
                       n = c(354L, 277L, 200L, 179L, 147L, 129L, 127L, 117L, 102L, 80L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))

library(dplyr)
quux %>%
  tidyr::separate_rows(category, sep = "\\|") %>%
  count(category, keyword) %>%
  arrange(desc(n))
# # A tibble: 32 × 3
#    category                keyword     n
#    <chr>                   <chr>   <int>
#  1 Electronics             product     2
#  2 Accessories&Peripherals cable       1
#  3 Accessories&Peripherals chargi…     1
#  4 Accessories&Peripherals nice        1
#  5 Accessories&Peripherals product     1
#  6 Accessories&Peripherals quality     1
#  7 Cables&Accessori…       cable       1
#  8 Cables&Accessori…       chargi…     1
#  9 Cables&Accessori…       nice        1
# 10 Cables&Accessori…       product     1
# # ℹ 22 more rows
# # ℹ Use `print(n = ...)` to see more rows

字符串
从这里，你可以做你的前5个过滤和旋转：

quux %>%
  tidyr::separate_rows(category, sep = "\\|") %>%
  count(category, keyword) %>%
  slice_max(n = 5, order_by = n, with_ties = FALSE) %>%
  tidyr::pivot_wider(names_from = category, values_from = n, values_fill = list(n = 0))
# # A tibble: 4 × 3
#   keyword Electronics `Accessories&Peripherals`
#   <chr>         <int>                     <int>
# 1 product           2                         1
# 2 cable             0                         1
# 3 chargi…           0                         1
# 4 nice              0                         1

型

赞(0）回复(0）举报 2023-11-14

我来回答

使用group_by函数为每个类别搜索前5个关键字

1条答案

相关问题

热门标签

最新问答