R语言跨多列计数事件并按年份分组

nfeuvbwi 于 2023-02-10 发布在其他

关注(0)|答案(2)|浏览(167)

我有一个电影数据集，其中有一列是年份，三列是流派。
下面是一个例子：

genre_structure<-structure(
  list(
    year = c(
      "2008",
      "2003",
      "2010",
      "2001",
      "2002",
      "1999",
      "1980",
      "2020",
      "1977",
      "1991",
      "1954",
      "2022",
      "1962",
      "2000",
      "1994",
      "2019",
      "2019",
      "1981",
      "2012",
      "2003"
    ),
    genre1 = c(
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action"
    ),
    genre2 = c(
      "Crime",
      "Adventure",
      "Adventure",
      "Adventure",
      "Adventure",
      "SciFi",
      "Adventure",
      "Drama",
      "Adventure",
      "SciFi",
      "Drama",
      "Drama",
      "Drama",
      "Adventure",
      "Crime",
      "Adventure",
      "Adventure",
      "Adventure",
      "Drama",
      "Drama"
    ),
    genre3 = c(
      "Drama",
      "Drama",
      "SciFi",
      "Drama",
      "Drama",
      "",
      "Fantasy",
      "",
      "Fantasy",
      "",
      "",
      "Mystery",
      "Mystery",
      "Drama",
      "Drama",
      "Crime",
      "Drama",
      "",
      "",
      "Mystery"
    )
  ),
  row.names = c(NA,-20L),
  class =  "data.frame"
  )

我试图计算所有3个流派为每年。预期结果是（例如）：

genre | year| count
Action |2008| 1
Comedy | 2008 | 3
Drama | 2008 | 4
...

我试过：

genre_years_test<-genre_structure %>% 
  group_by(genre1, genre2, genre3, year) %>% 
  summarise(total=n(), .groups = "drop")

但每当一个新的流派在那一年发行时，它都在重复这几年。

r

来源：https://stackoverflow.com/questions/75366193/count-occurrences-across-multiple-columns-and-group-by-year

2条答案

按热度按时间

sr4lhrrt1#

我们可以将其整形为“long”，并得到count

library(dplyr)
library(tidyr)
genre_structure %>% 
  pivot_longer(cols = -year, values_to = 'genre') %>%
  count(year, genre, name = 'count')

赞(0）回复(0）举报 2023-02-10

jqjz2hbq2#

下面是base中的一个解决方案，仅供lafs使用：

subset(as.data.frame(
        table(cbind(genre_structure[1], stack(genre_structure[-1]))[-3])
                    ), Freq != 0)

在data.table中：

library(data.table)

melt(setDT(genre_structure), id.vars = c("year"),
                             variable.name = "genre")[, list(Freq =.N), 
                                                       .(year, value)]

赞(0）回复(0）举报 2023-02-10

我来回答

R语言跨多列计数事件并按年份分组

2条答案

相关问题

热门标签

最新问答

R语言 跨多列计数事件并按年份分组

2条答案

相关问题

热门标签

最新问答

R语言跨多列计数事件并按年份分组