R语言 根据多列中的值对术语列表进行排序

enxuqcxy  于 2023-09-27  发布在  其他
关注(0)|答案(1)|浏览(122)

我管理着一个电影节,每年举办一次“最佳电影”比赛。每年可以有多达50部电影和任何数量的评委。
我计划使用R来分析各个评委的分数,然后在Markdown中发布一份报告,我会分发给他们。
下面是带有示例原始数据的嵌套框架的样子:

Film = c("Vertigo", "Fargo", "Platoon", "Beetlejuice", "Zodiac")
Mean = c( 7.45,6.98,7.5,7.2,5.82)
TrimmedMean = c(7.42, 7.04,7.36,7.05,6.26)
Judge1 = c(5.56,6.00,7.00,6.22,5.33)
Judge2 = c(6.89,8.81,8.83,7.22,6.78)
Judge3 = c(9.25,6.38,6.71,6.88,6.50)
Judge4 = c(8.00,7.67,7.44,6.89,7.22)
Judge5 = c(5.11,4.89,7.11,8.78,6.44)
Judge6 = c(9.89,8.11,7.89,7.22,2.67)
RawData = data.frame(Film,Mean,TrimmedMean,Judge1,Judge2,Judge3,Judge4,Judge5,Judge6)

“平均数”和“修整平均数”栏可视为本练习的额外“判断”。
我想根据每个评委的评分对电影进行排名,然后删除数值,留下一个排序后的电影列表,如下所示:

Mean = c('Platoon','Vertigo','Beetlejuice','Fargo','Zodiac')
TrimmedMean = c('Vertigo','Platoon','Beetlejuice','Fargo','Zodiac')
Judge1 = c('Platoon','Beetlejuice','Fargo','Vertigo','Zodiac')
Judge2 = c('Platoon','Fargo','Beetlejuice','Vertigo','Zodiac')
Judge3 = c('Vertigo','Beetlejuice','Platoon','Zodiac','Fargo')
Judge4 = c('Vertigo','Fargo','Platoon','Zodiac','Beetlejuice')
Judge5 = c('Beetlejuice','Platoon','Zodiac','Vertigo','Fargo')
Judge6 = c('Vertigo','Fargo','Platoon','Beetlejuice','Zodiac')
SortedData = data.frame(Mean,TrimmedMean,Judge1,Judge2,Judge3,Judge4,Judge5,Judge6)

我生成第一个表的主数据集有5年前的数据,我可以按年份划分子集。我希望能够分析来自不同年份的数据,而不必对每一年的代码进行更改,而只需指定要子集的年份。该守则应响应不同年份的数据集,这些数据集将有不同数量的法官和电影。
有什么想法吗?
下面是我分析的R codeCSV file。为了隐私,我把法官的名字改成了法官1,法官2等。

py49o6xq

py49o6xq1#

我想你要找的函数是order()。它显示了向量中值的相对顺序。

order(c(1.65, 6.92, 7.98, 3.42, 5.75), decreasing=TRUE)
# [1] 3 2 5 4 1

我们可以使用这个顺序,为每个法官,查找Film列中的名字。

data.frame(lapply(RawData[, -1], function(x) RawData$Film[order(x, decreasing=TRUE)]))
#          Mean TrimmedMean      Judge1      Judge2      Judge3      Judge4      Judge5      Judge6
# 1     Platoon     Vertigo     Platoon     Platoon     Vertigo     Vertigo Beetlejuice     Vertigo
# 2     Vertigo     Platoon Beetlejuice       Fargo Beetlejuice       Fargo     Platoon       Fargo
# 3 Beetlejuice Beetlejuice       Fargo Beetlejuice     Platoon     Platoon      Zodiac     Platoon
# 4       Fargo       Fargo     Vertigo     Vertigo      Zodiac      Zodiac     Vertigo Beetlejuice
# 5      Zodiac      Zodiac      Zodiac      Zodiac       Fargo Beetlejuice       Fargo      Zodiac

同样的,但有点更优雅(我认为)和更快

SortedData <- RawData[, -1]
SortedData[] <- RawData$Film[sapply(RawData[, -1], order, decreasing=TRUE)]

SortedData
#          Mean TrimmedMean      Judge1      Judge2      Judge3      Judge4      Judge5      Judge6
# 1     Platoon     Vertigo     Platoon     Platoon     Vertigo     Vertigo Beetlejuice     Vertigo
# 2     Vertigo     Platoon Beetlejuice       Fargo Beetlejuice       Fargo     Platoon       Fargo
# 3 Beetlejuice Beetlejuice       Fargo Beetlejuice     Platoon     Platoon      Zodiac     Platoon
# 4       Fargo       Fargo     Vertigo     Vertigo      Zodiac      Zodiac     Vertigo Beetlejuice
# 5      Zodiac      Zodiac      Zodiac      Zodiac       Fargo Beetlejuice       Fargo      Zodiac

使用链接的数据

dtf <- structure(list(Judge=rep(paste("Judge", 1:5), each=14),
  Year=rep(2023, 70), Film=rep(c("And Then It Rained",
  "Catching Climate", "Compact", "Crumble", "Cyanosis",
  "Don't Leave Me", "Love Cut", "Lungs", "Mute", "Ragdoll",
  "Room", "The Demon Within", "The Ghost Hunter",
  "The Outside Dunny"), 5), Originality.Creativity=c(7, 6, 7, 7, 6,
  6, 6, 7, 6, 7, 7, 5, 7, 7, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 7, 6,
  6, 7, 5, 5, 5, 8, 3, 8, 7, 3, 3, 3, 5, 3, 7, 7, 8, 8, 9, 7, 9,
  5, 9, 7, 10, 8, 9, 10, 9, 8, 7, 7, 7, 6, 8, 8, 6, 8, 6, 6, 7,
  8, 8, 8), Direction=c(6, 5, 6, 6, 6, 6, 7, 6, 6, 7, 6, 6, 7, 5,
  6, 7, 7, 7, 8, 7, 7, 7, 8, 7, 8, 7, 7, 8, 7, 5, 6, 8, 4, 8, 6,
  3, 4, 5, 7, 4, 8, 6, 9, 8, 8, 9, 9, 4, 9, 7, 10, 8, 8, 9, 9, 9,
  8, 6, 7, 8, 8, 9, 6, 7, 7, 7, 8, 8, 8, 7), Writing=c(6, 5, 6,
  6, 6, 6, 6, 5, 6, 7, 6, 6, 7, 6, 7, 7, 7, 7, 7, 6, 7, 6, 8, 8,
  8, 7, 7, 8, 7, 5, 8, 9, 4, 8, 7, 4, 4, 4, 8, 3, 8, 7, 8, 9, 9,
  9, 8, 3, 8, 6, 9, 7, 8, 9, 9, 9, 8, 6, 5, 6, 8, 8, 6, 7, 6, 7,
  8, 7, 8, 7), Cinematography=c(6, 5, 6, 6, 6, 6, 6, 5, 6, 8, 6,
  6, 7, 5, 7, 7, 8, 7, 8, 7, 7, 6, 8, 8, 8, 7, 8, 8, 8, 5, 6, 9,
  6, 8, 5, 3, 5, 6, 5, 5, 7, 5, 9, 8, 10, 10, 10, 4, 8, 7, 9, 10,
  9, 9, 10, 9, 8, 6, 6, 7, 9, 8, 7, 7, 7, 8, 7, 8, 9, 6),
  Performance=c(6, 5, 5, 6, 6, 6, 7, 5, 6, 8, 5, 6, 7, 5, 7, 7,
  8, 7, 8, 7, 7, 7, 9, 8, 8, 7, 8, 8, 6, 5, 8, 9, 5, 8, 5, 2, 6,
  6, 4, 4, 8, 5, 9, 9, 10, 10, 10, 4, 9, 10, 10, 10, 9, 10, 10,
  9, 8, 6, 6, 7, 9, 8, 6, 8, 8, 7, 7, 7, 8, 6),
  ProductionValue=c(6, 5, 5, 6, 6, 6, 6, 5, 6, 8, 5, 6, 7, 5, 7,
  8, 7, 7, 8, 7, 6, 7, 8, 7, 8, 7, 8, 8, 7, 5, 6, 8, 5, 8, 6, 3,
  4, 4, 6, 5, 8, 5, 8, 8, 8, 9, 9, 3, 8, 7, 10, 8, 8, 9, 9, 8, 8,
  6, 6, 8, 9, 8, 7, 7, 7, 8, 7, 8, 7, 6), Pacing=c(6, 5, 5, 5, 6,
  6, 6, 5, 6, 7, 5, 6, 7, 5, 7, 7, 7, 6, 8, 7, 7, 7, 8, 8, 8, 7,
  8, 8, 7, 5, 6, 8, 5, 8, 6, 3, 5, 3, 6, 5, 8, 5, 7, 9, 7, 9, 8,
  2, 9, 8, 9, 8, 9, 10, 9, 9, 7, 7, 6, 7, 8, 8, 7, 7, 6, 7, 8, 7,
  6, 7), Structure=c(6, 5, 6, 5, 6, 6, 6, 5, 6, 6, 5, 6, 7, 5, 7,
  7, 7, 7, 8, 7, 7, 6, 8, 7, 8, 7, 8, 8, 7, 5, 6, 9, 5, 8, 7, 3,
  4, 4, 6, 6, 8, 4, 7, 9, 8, 8, 8, 1, 8, 8, 9, 7, 8, 9, 10, 9, 7,
  7, 6, 7, 8, 8, 7, 7, 7, 7, 8, 7, 7, 7), Sound.Music=c(6, 5, 6,
  6, 6, 6, 6, 6, 6, 7, 6, 7, 7, 5, 6, 5, 6, 8, 7, 7, 7, 6, 8, 6,
  8, 8, 7, 7, 7, 5, 6, 9, 5, 8, 7, 3, 4, 6, 5, 7, 8, 5, 8, 8, 9,
  9, 9, 1, 8, 8, 9, 8, 9, 9, 9, 8, 6, 6, 6, 8, 8, 7, 8, 7, 6, 9,
  7, 8, 9, 7)), row.names=c(NA, -70L), class="data.frame")

# Order the data.frame, just in case
dtf <- dtf[with(dtf, order(Judge, Film, Year)),]
rownames(dtf) <- NULL

# Split the data.frame based on Judge
dtf.s <- split(dtf, ~ Judge)

# Check if all the films are the same
s <- sapply(dtf.s, function(x) x$Film)
stopifnot(all(s[, 1] == s[, -1]))

# Calculate average scores per judge
ascores <- sapply(dtf.s, 
  function(x) rowMeans(x[,-(1:3)])
)
ascores
#      Judge 1   Judge 2   Judge 3   Judge 4   Judge 5
# 1  6.1111111 6.6666667 6.7777778 8.1111111 7.4444444
# 2  5.1111111 6.7777778 5.0000000 8.4444444 6.3333333
# 3  5.7777778 7.0000000 6.3333333 8.6666667 6.1111111
# 4  5.8888889 6.8888889 8.5555556 8.8888889 7.1111111
# 5  6.0000000 7.5555556 4.6666667 8.8888889 8.3333333
# 6  6.0000000 6.7777778 8.0000000 3.0000000 8.0000000
# 7  6.2222222 6.7777778 6.2222222 8.4444444 6.6666667
# 8  5.4444444 6.4444444 3.0000000 7.5555556 7.2222222
# 9  6.0000000 8.0000000 4.3333333 9.4444444 6.6666667
# 10 7.2222222 7.2222222 4.5555556 8.2222222 7.3333333
# 11 5.6666667 7.8888889 5.7777778 8.5555556 7.4444444
# 12 6.0000000 7.0000000 4.6666667 9.3333333 7.5555556
# 13 7.0000000 7.4444444 7.7777778 9.3333333 7.7777778
# 14 5.3333333 7.7777778 5.4444444 8.6666667 6.7777778

# Calculate average score across judges
avg.score <- rowMeans(ascores)

# Calculate olympic average score across judges
# Discard highest and lowest values
olympic <- function(x) {
    (sum(x)-sum(range(x)))/(length(x)-2)
}

oavg.score <- apply(ascores, 1, olympic)

# Calculate winsorized average score across judges
# Clamp values to second highest and second lowest
winsor <- function(x) {
    xtr <- x[-c(which.max(x), which.min(x))]
    (sum(xtr)+sum(range(xtr)))/length(x)
}

wavg.score <- apply(ascores, 1, winsor)

# Combine scores into a matrix
scores <- cbind(mean=avg.score, olympic=oavg.score, winsor=wavg.score, ascores)

# Get film rankings, highest scoring at the top
ranks <- scores
ranks[] <- dtf.s[[c(1, 3)]][apply(scores, 2, order, decreasing=TRUE)]

ranks
#    mean                 olympic              winsor               Judge 1             
# 1  "The Ghost Hunter"   "The Ghost Hunter"   "The Ghost Hunter"   "Ragdoll"           
# 2  "Crumble"            "Crumble"            "Crumble"            "The Ghost Hunter"  
# 3  "Cyanosis"           "Cyanosis"           "Ragdoll"            "Love Cut"          
# 4  "Room"               "Ragdoll"            "Cyanosis"           "And Then It Rained"
# 5  "And Then It Rained" "Room"               "And Then It Rained" "Cyanosis"          
# 6  "Ragdoll"            "And Then It Rained" "Room"               "Don't Leave Me"    
# 7  "The Demon Within"   "Don't Leave Me"     "Don't Leave Me"     "Mute"              
# 8  "Mute"               "Mute"               "Mute"               "The Demon Within"  
# 9  "Love Cut"           "The Demon Within"   "The Demon Within"   "Crumble"           
# 10 "The Outside Dunny"  "The Outside Dunny"  "The Outside Dunny"  "Compact"           
# 11 "Compact"            "Love Cut"           "Love Cut"           "Room"              
# 12 "Don't Leave Me"     "Compact"            "Compact"            "Lungs"             
# 13 "Catching Climate"   "Lungs"              "Lungs"              "The Outside Dunny" 
# 14 "Lungs"              "Catching Climate"   "Catching Climate"   "Catching Climate"  
#    Judge 2              Judge 3              Judge 4              Judge 5             
# 1  "Mute"               "Crumble"            "Mute"               "Cyanosis"          
# 2  "Room"               "Don't Leave Me"     "The Demon Within"   "Don't Leave Me"    
# 3  "The Outside Dunny"  "The Ghost Hunter"   "The Ghost Hunter"   "The Ghost Hunter"  
# 4  "Cyanosis"           "And Then It Rained" "Crumble"            "The Demon Within"  
# 5  "The Ghost Hunter"   "Compact"            "Cyanosis"           "And Then It Rained"
# 6  "Ragdoll"            "Love Cut"           "Compact"            "Room"              
# 7  "Compact"            "Room"               "The Outside Dunny"  "Ragdoll"           
# 8  "The Demon Within"   "The Outside Dunny"  "Room"               "Lungs"             
# 9  "Crumble"            "Catching Climate"   "Catching Climate"   "Crumble"           
# 10 "Catching Climate"   "Cyanosis"           "Love Cut"           "The Outside Dunny" 
# 11 "Don't Leave Me"     "The Demon Within"   "Ragdoll"            "Love Cut"          
# 12 "Love Cut"           "Ragdoll"            "And Then It Rained" "Mute"              
# 13 "And Then It Rained" "Mute"               "Lungs"              "Catching Climate"  
# 14 "Lungs"              "Lungs"              "Don't Leave Me"     "Compact"

相关问题