我有一个电影数据集,其中有一列是年份,三列是流派。
下面是一个例子:
genre_structure<-structure(
list(
year = c(
"2008",
"2003",
"2010",
"2001",
"2002",
"1999",
"1980",
"2020",
"1977",
"1991",
"1954",
"2022",
"1962",
"2000",
"1994",
"2019",
"2019",
"1981",
"2012",
"2003"
),
genre1 = c(
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action",
"Action"
),
genre2 = c(
"Crime",
"Adventure",
"Adventure",
"Adventure",
"Adventure",
"SciFi",
"Adventure",
"Drama",
"Adventure",
"SciFi",
"Drama",
"Drama",
"Drama",
"Adventure",
"Crime",
"Adventure",
"Adventure",
"Adventure",
"Drama",
"Drama"
),
genre3 = c(
"Drama",
"Drama",
"SciFi",
"Drama",
"Drama",
"",
"Fantasy",
"",
"Fantasy",
"",
"",
"Mystery",
"Mystery",
"Drama",
"Drama",
"Crime",
"Drama",
"",
"",
"Mystery"
)
),
row.names = c(NA,-20L),
class = "data.frame"
)
我试图计算所有3个流派为每年。预期结果是(例如):
genre | year| count
Action |2008| 1
Comedy | 2008 | 3
Drama | 2008 | 4
...
我试过:
genre_years_test<-genre_structure %>%
group_by(genre1, genre2, genre3, year) %>%
summarise(total=n(), .groups = "drop")
但每当一个新的流派在那一年发行时,它都在重复这几年。
2条答案
按热度按时间sr4lhrrt1#
我们可以将其整形为“long”,并得到
count
jqjz2hbq2#
下面是
base
中的一个解决方案,仅供lafs使用:在
data.table
中: