如果 Dataframe 具有任何NA值,如何使用更宽透视

rseugnpd  于 2023-03-15  发布在  其他
关注(0)|答案(2)|浏览(99)

我有一个像这样的 Dataframe

df = data.frame(day = c("1", NA, NA, NA, NA, "2", NA, NA, NA),
                Unit = c("unit1", NA, NA, NA, "unit2", "unit1", NA, NA, "unit2"),
                Problem = c("Oil", "Engine", "Electric", NA, NA, "Oil", "Power", NA, NA),
                duration = c(2, 5, 1, NA, NA, 1.5, 3, NA, NA))

row 1:5为day1,6:9为day2,如果同一列中有重复值,则值为NA。
我试着用

df %>% 
  pivot_wider(names_from = Problem, values_from = duration)

但是它不工作,我期望的df是这样的

df1 = data.frame(day = c("1", "1", "2", "2"),
                 Unit = c("unit1", "unit2", "unit1", "unit2"),
                 Oil = c(2, 0, 1.5, 0),
                 Engine = c(5, 0, 0, 0),
                 Electric = c(1, 0, 0, 0),
                 Power = c(0, 0, 3, 0),
                 NoProblem = c(0, 0, 0, 0))
flvtvl50

flvtvl501#

我们用先前的非NA元素对NA进行fill,得到distinct行,并用pivot_wider重新整形为宽

library(dplyr)
library(tidyr)
df %>% 
  fill(day, Unit, Problem) %>%
  distinct(day, Unit, Problem, .keep_all = TRUE) %>% 
  mutate(duration = replace_na(duration, 0)) %>% 
  pivot_wider(names_from = Problem, values_from = duration, 
   values_fill = 0) %>% 
  mutate(NoProblem = 0)
  • 输出
# A tibble: 4 × 7
  day   Unit    Oil Engine Electric Power NoProblem
  <chr> <chr> <dbl>  <dbl>    <dbl> <dbl>     <dbl>
1 1     unit1   2        5        1     0         0
2 1     unit2   0        0        0     0         0
3 2     unit1   1.5      0        0     3         0
4 2     unit2   0        0        0     0         0
toiithl6

toiithl62#

基本上,首先用zoo::na.locf填充日和单位,然后用reshape填充。

df[1:2] <- zoo::na.locf(df[1:2])
(res <- reshape(df, direction='wide', idvar=c('Unit', 'day'), timevar='Problem'))
#   day  Unit duration.Oil duration.Engine duration.Electric duration.NA duration.Power
# 1   1 unit1          2.0               5                 1          NA             NA
# 5   1 unit2           NA              NA                NA          NA             NA
# 6   2 unit1          1.5              NA                NA          NA              3
# 9   2 unit2           NA              NA                NA          NA             NA

在此引发警告,因为unit2没有问题。
可以是具有0replaced,

replace(res, is.na(res), 0)
#   day  Unit duration.Oil duration.Engine duration.Electric duration.NA duration.Power
# 1   1 unit1          2.0               5                 1           0              0
# 5   1 unit2          0.0               0                 0           0              0
# 6   2 unit1          1.5               0                 0           0              3
# 9   2 unit2          0.0               0                 0           0              0

但这样做并不正确,而且它们实际上并不重要,因为您经常可以在执行以下计算时使用na.rm

by(res[3:7], res$Unit, colSums, na.rm=TRUE)
# res$Unit: unit1
#     duration.Oil   duration.Engine duration.Electric       duration.NA    duration.Power 
#              3.5               5.0               1.0               0.0               3.0 
# ------------------------------------------------------------------------------------------------------ 
# res$Unit: unit2
#     duration.Oil   duration.Engine duration.Electric       duration.NA    duration.Power 
#                0                 0                 0                 0                 0

相关问题