R:计算多个组的百分位数

svgewumm  于 2022-12-27  发布在  其他
关注(0)|答案(1)|浏览(146)

我正在使用R编程语言。
我有以下数据集:

set.seed(123)

library(dplyr)
var1 = rnorm(10000, 100,100)
var2 = rnorm(10000, 100,100)
var3 = rnorm(10000, 100,100)
var4 = rnorm(10000, 100,100)
id = 1:10000

final = data.frame(id, var1, var2, var3, var4)

final = final %>%
  mutate(class1 = case_when(var1 < mean(var1) ~ "A",
                             TRUE ~ "B")) %>% 
mutate(class2 = case_when(var2 < mean(var2) ~ "C",
                             TRUE ~ "D"))

我想根据class 1和class 2的每个唯一组合计算var 3和var 4的十分位数。
据我所知,这意味着:

  • 对于所有行WHERE class 1 = A AND class 2 = C,计算/分配var 3和var 4的十分位数
  • 对于所有行(其中class 1 = A且class 2 = D),计算/分配var 3和var 4的十分位数
  • 对于所有行WHERE class 1 = B AND class 2 = C,计算/分配var 3和var 4的十分位数
  • 对于所有行WHERE class 1 = B AND class 2 = D,计算/分配var 3和var 4的十分位数

下面是我为此编写的R代码:

final = final %>%
group_by(class1, class2) %>%
  mutate(class3 = case_when(ntile(var3, 10) == 1 ~ "one",
                             ntile(var3, 10) == 2 ~ "two",
                             ntile(var3, 10) == 3 ~ "three",
                             ntile(var3, 10) == 4 ~ "four",
                             ntile(var3, 10) == 5 ~ "five",
                             ntile(var3, 10) == 6 ~ "six",
                             ntile(var3, 10) == 7 ~ "seven",
                             ntile(var3, 10) == 8 ~ "eight",
                             ntile(var3, 10) == 9 ~ "nine",
                             ntile(var3, 10) == 10 ~ "ten")) %>%
  mutate(class4 = case_when(ntile(var4, 10) == 1 ~ "one",
                             ntile(var4, 10) == 2 ~ "two",
                             ntile(var4, 10) == 3 ~ "three",
                             ntile(var4, 10) == 4 ~ "four",
                             ntile(var4, 10) == 5 ~ "five",
                             ntile(var4, 10) == 6 ~ "six",
                             ntile(var4, 10) == 7 ~ "seven",
                             ntile(var4, 10) == 8 ~ "eight",
                             ntile(var4, 10) == 9 ~ "nine",
                             ntile(var4, 10) == 10 ~ "ten"))

有人能告诉我我做得对不对吗

谢谢!

wgx48brx

wgx48brx1#

可以使用english轻松完成,而不是使用case_when

library(dplyr)
library(stringr)
final %>%
   group_by(class1, class2) %>% 
   mutate(across(var3:var4, 
         ~ as.character(english::english(ntile(.x, 10))),
       .names = "{str_replace(.col, 'var', 'class')}")) %>% 
   ungroup
  • 输出
# A tibble: 10,000 × 9
      id  var1  var2    var3  var4 class1 class2 class3 class4
   <int> <dbl> <dbl>   <dbl> <dbl> <chr>  <chr>  <chr>  <chr> 
 1     1  44.0 337.    16.4   80.6 A      D      three  five  
 2     2  77.0  83.3   77.9  126.  A      C      five   six   
 3     3 256.  193.  -110.    46.2 B      D      one    four  
 4     4 107.   43.2  -66.8  -17.9 B      C      one    two   
 5     5 113.  123.    -9.80 190.  B      D      two    nine  
 6     6 272.  213.   -66.6   98.4 B      D      one    six   
 7     7 146.  238.    95.0  118.  B      D      five   six   
 8     8 -26.5  76.7  256.   160.  A      C      ten    eight 
 9     9  31.3 -60.1   59.5  126.  A      C      four   six   
10    10  55.4  70.2  179.   130.  A      C      eight  seven 
# … with 9,990 more rows
# ℹ Use `print(n = ...)` to see more rows

相关问题