R语言 跨多列计数事件并按年份分组

nfeuvbwi  于 2023-02-10  发布在  其他
关注(0)|答案(2)|浏览(167)

我有一个电影数据集,其中有一列是年份,三列是流派。
下面是一个例子:

genre_structure<-structure(
  list(
    year = c(
      "2008",
      "2003",
      "2010",
      "2001",
      "2002",
      "1999",
      "1980",
      "2020",
      "1977",
      "1991",
      "1954",
      "2022",
      "1962",
      "2000",
      "1994",
      "2019",
      "2019",
      "1981",
      "2012",
      "2003"
    ),
    genre1 = c(
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action",
      "Action"
    ),
    genre2 = c(
      "Crime",
      "Adventure",
      "Adventure",
      "Adventure",
      "Adventure",
      "SciFi",
      "Adventure",
      "Drama",
      "Adventure",
      "SciFi",
      "Drama",
      "Drama",
      "Drama",
      "Adventure",
      "Crime",
      "Adventure",
      "Adventure",
      "Adventure",
      "Drama",
      "Drama"
    ),
    genre3 = c(
      "Drama",
      "Drama",
      "SciFi",
      "Drama",
      "Drama",
      "",
      "Fantasy",
      "",
      "Fantasy",
      "",
      "",
      "Mystery",
      "Mystery",
      "Drama",
      "Drama",
      "Crime",
      "Drama",
      "",
      "",
      "Mystery"
    )
  ),
  row.names = c(NA,-20L),
  class =  "data.frame"
  )

我试图计算所有3个流派为每年。预期结果是(例如):

genre | year| count
Action |2008| 1
Comedy | 2008 | 3
Drama | 2008 | 4
...

我试过:

genre_years_test<-genre_structure %>% 
  group_by(genre1, genre2, genre3, year) %>% 
  summarise(total=n(), .groups = "drop")

但每当一个新的流派在那一年发行时,它都在重复这几年。

sr4lhrrt

sr4lhrrt1#

我们可以将其整形为“long”,并得到count

library(dplyr)
library(tidyr)
genre_structure %>% 
  pivot_longer(cols = -year, values_to = 'genre') %>%
  count(year, genre, name = 'count')
jqjz2hbq

jqjz2hbq2#

下面是base中的一个解决方案,仅供lafs使用:

subset(as.data.frame(
        table(cbind(genre_structure[1], stack(genre_structure[-1]))[-3])
                    ), Freq != 0)

data.table中:

library(data.table)

melt(setDT(genre_structure), id.vars = c("year"),
                             variable.name = "genre")[, list(Freq =.N), 
                                                       .(year, value)]

相关问题