如何选择R中每行的最后n个值?

mwg9r5ms  于 2023-01-03  发布在  其他
关注(0)|答案(4)|浏览(207)

我喜欢从 Dataframe 中选择每行最后3个非空值:

df <- structure(list(V1 = c("Johannes Gutenberg University of Mainz", 
"Eldagsener Str. 38", "Linneper Weg 1", "Gohrstraße 74", "Düppelstraße 36", 
"Blutspende: Haus A3"), V2 = c(" Gebäude 900", " 31832 Springe", 
" 40885 Ratingen", " 47475 Kamp-Lintfort", " 12163 Berlin", " Ebene -3"
), V3 = c(" Augustuspl. 4", " Germany", " Germany", " Germany", 
" Germany", " Zentrum Innere Medizin (ZIM Blutbank / Immunhämatologisches Labor Haus A1"
), V4 = c(" 55131 Mainz", "", "", "", "", " Zentrum Operative Medizin (ZOM"
), V5 = c(" Germany", "", "", "", "", " Oberdürrbacher Str. 6"
), V6 = c("", "", "", "", "", " 97080 Würzburg"), V7 = c("", 
"", "", "", "", " Germany")), row.names = 24:29, class = "data.frame")

有时候,在开头有一些不相关的文本信息,结尾有一些空的单元格,关键的信息总是每行最后3个非空的条目。
我想使用tidyverse,但也欢迎使用其他解决方案。

shyt4zoc

shyt4zoc1#

逐行循环- * 应用 *,删除空白,获取最后3个值:

data.frame(t(apply(df, 1, function(i){ tail(i[ i != "" ], 3) })))
#                        X1                   X2       X3
# 24          Augustuspl. 4          55131 Mainz  Germany
# 25     Eldagsener Str. 38        31832 Springe  Germany
# 26         Linneper Weg 1       40885 Ratingen  Germany
# 27          Gohrstraße 74  47475 Kamp-Lintfort  Germany
# 28        Düppelstraße 36         12163 Berlin  Germany
# 29  Oberdürrbacher Str. 6       97080 Würzburg  Germany

**注意:**如果值之间有空格,则会将其删除,并且列可能无法对齐,例如,比较第一行:

df[1, 4] <- ""
data.frame(t(apply(df, 1, function(i){ tail(i[ i != "" ], 3)})))
#                        X1                   X2       X3
# 24            Gebäude 900        Augustuspl. 4  Germany
# 25     Eldagsener Str. 38        31832 Springe  Germany
# 26         Linneper Weg 1       40885 Ratingen  Germany
# 27          Gohrstraße 74  47475 Kamp-Lintfort  Germany
# 28        Düppelstraße 36         12163 Berlin  Germany
# 29  Oberdürrbacher Str. 6       97080 Würzburg  Germany
klr1opcd

klr1opcd2#

library(tidyverse)

df |> 
  mutate(rn = row_number()) |> 
  pivot_longer(cols = V1:V7) |> 
  mutate(isem = value !="") |>
  filter(isem) |> 
  group_by(rn) |> 
  slice_tail(n=3) |> 
  select(-name, -isem) |> 
  mutate(rn = 1:3) |> 
  pivot_wider(names_from = rn, values_from = value) |> 
  unnest()

#> # A tibble: 6 × 3
#>   `1`                      `2`                    `3`       
#>   <chr>                    <chr>                  <chr>     
#> 1 " Augustuspl. 4"         " 55131 Mainz"         " Germany"
#> 2 "Eldagsener Str. 38"     " 31832 Springe"       " Germany"
#> 3 "Linneper Weg 1"         " 40885 Ratingen"      " Germany"
#> 4 "Gohrstraße 74"          " 47475 Kamp-Lintfort" " Germany"
#> 5 "Düppelstraße 36"        " 12163 Berlin"        " Germany"
#> 6 " Oberdürrbacher Str. 6" " 97080 Würzburg"      " Germany"
wh6knrhe

wh6knrhe3#

使用tidyr和dplyr:

library(dplyr)
library(tidyr)

df %>% 
  mutate(row = row_number()) %>%
  pivot_longer(!row) %>%
  filter(value != "") %>%
  group_by(row) %>%
  slice_tail(n = 3) %>%
  mutate(name = paste0("V", 1:3)) %>%
  ungroup() %>%
  pivot_wider()
# A tibble: 6 × 4
    row V1                       V2                     V3        
  <int> <chr>                    <chr>                  <chr>     
1     1 " Augustuspl. 4"         " 55131 Mainz"         " Germany"
2     2 "Eldagsener Str. 38"     " 31832 Springe"       " Germany"
3     3 "Linneper Weg 1"         " 40885 Ratingen"      " Germany"
4     4 "Gohrstraße 74"          " 47475 Kamp-Lintfort" " Germany"
5     5 "Düppelstraße 36"        " 12163 Berlin"        " Germany"
6     6 " Oberdürrbacher Str. 6" " 97080 Würzburg"      " Germany"
lsmepo6l

lsmepo6l4#

假设您的最后三列命名为“V5”、“V6”和“V7”,如示例所示,您可以在一行命令中使用filter

filter(df,V5!= "",V6 != "",V7 != "")

如果你只需要这些列,你可以

df |> 
  select(V5:V7) |> 
  filter(V5!= "",V6 != "",V7 != "")

相关问题