删除 Dataframe 列表中的部分行名称

dluptydi  于 2022-12-25  发布在  其他
关注(0)|答案(2)|浏览(127)

我有两个 Dataframe 列表,其中一个 Dataframe 列表的结构如下:

data1 

Label                            Pred   n
1 Mito-0001_Series007_blue.tif   Pear  10
2 Mito-0001_Series007_blue.tif Orange 223
3 Mito-0001_Series007_blue.tif  Apple 890
4 Mito-0001_Series007_blue.tif  Peach  34

并以不同的数字重复,例如

Label                            Pred   n
1 Mito-0002_Series007_blue.tif   Pear  90
2 Mito-0002_Series007_blue.tif Orange  127
3 Mito-0002_Series007_blue.tif  Apple  76
4 Mito-0002_Series007_blue.tif  Peach  344

第二个 Dataframe 列表的结构如下:

data2

Slice                                       Area
Mask of Mask-0001Series007_blue-1.tif.      789.21

等等

问题

我想
1.通过以下方式使行名称匹配:
a)从数据1中删除“Mito-”
B)从数据2中删除“掩码的掩码-”
c)删除数据2末尾的“-1”
请记住,这是一个 Dataframe 列表。

目前为止:

我已经使用了来自名为“如何删除数据框中行名称的某些部分”的帖子的信息
How can I remove certain part of row names in data frame
他们建议使用

data2$Slice <- sub("Mask of Mask-", "", data2$Slice)

这显然不适用于 Dataframe 列表,它返回一个空字符

character(0)

提前感谢,我一直惊讶于人们在这个网站上回答问题是多么的棒:)

8ehkhllq

8ehkhllq1#

首先,我们可以定义一个函数f,该函数将gsub应用于一个适合所有情况的正则表达式。

f <- \(x) gsub('.*(\\d{4}_?Series\\d{3}_blue).*(\\.tif)?\\.?', '\\1\\2', x)
  • 说明:*
  • .*任意单个字符,重复
  • \\d{4}四位数
  • _?下划线(如果可用)
  • Series字面上
  • (...)捕获组(它们在内部编号)
  • \\.一个句点(需要转义,否则我们说"任意字符")
  • \\1捕获组1
  • 一个月一次 *
## test it
(x <- c(names(data1), data1[[1]]$Label, data2$Slice))
# [1] "Mito-0001_Series007_blue"               "Mito-0002_Series007_blue"              
# [3] "Mito-0001_Series007_blue.tif"           "Mito-0001_Series007_blue.tif"          
# [5] "Mito-0001_Series007_blue.tif"           "Mito-0001_Series007_blue.tif"          
# [7] "Mask of Mask-0001Series007_blue-1.tif."

f(x)
# [1] "0001_Series007_blue" "0002_Series007_blue" "0001_Series007_blue" "0001_Series007_blue"
# [5] "0001_Series007_blue" "0001_Series007_blue" "0001Series007_blue"

似乎有用,所以我们可以应用它。

names(data1) <- f(names(data1))
data1 <- lapply(data1, \(x) {x$Label <- f(x$Label); x})
data2$Slice <- f(data2$Slice)

data1
# $`0001_Series007_blue`
# Label   Pred   n
# 1 0001_Series007_blue   Pear  10
# 2 0001_Series007_blue Orange 223
# 3 0001_Series007_blue  Apple 890
# 4 0001_Series007_blue  Peach  34
# 
# $`0002_Series007_blue`
# Label   Pred   n
# 1 0002_Series007_blue   Pear  90
# 2 0002_Series007_blue Orange 127
# 3 0002_Series007_blue  Apple  76
# 4 0002_Series007_blue  Peach 344

data2
#                Slice   Area
# 1 0001Series007_blue 789.21
  • 数据:*
data1 <- list(`Mito-0001_Series007_blue` = structure(list(Label = c("Mito-0001_Series007_blue.tif", 
"Mito-0001_Series007_blue.tif", "Mito-0001_Series007_blue.tif", 
"Mito-0001_Series007_blue.tif"), Pred = c("Pear", "Orange", "Apple", 
"Peach"), n = c(10L, 223L, 890L, 34L)), class = "data.frame", row.names = c("1", 
"2", "3", "4")), `Mito-0002_Series007_blue` = structure(list(
    Label = c("Mito-0002_Series007_blue.tif", "Mito-0002_Series007_blue.tif", 
    "Mito-0002_Series007_blue.tif", "Mito-0002_Series007_blue.tif"
    ), Pred = c("Pear", "Orange", "Apple", "Peach"), n = c(90L, 
    127L, 76L, 344L)), class = "data.frame", row.names = c("1", 
"2", "3", "4")))

data2 <- structure(list(Slice = "Mask of Mask-0001Series007_blue-1.tif.", 
    Area = 789.21), class = "data.frame", row.names = c(NA, -1L
))
erhoui1w

erhoui1w2#

使用给定信息

@jay.sf给出的答案非常有用。但它只适用于data 1,而不是data 2。为了确保它也适用于data 2,我额外添加了一行代码:

#Old code
f <-function(x) gsub('.*(\\d{4}_?Series\\d{3}_blue).*(\\.tif)?\\.?', '\\1\\2', x)

#I added the [[1]] after data2 as well
(x <- c(names(data1), data1[[1]]$Label, data2[[1]]$Slice))
f(x)

names(data1) <- f(names(data1))
data1 <- lapply(data1, function(x) {x$Label <- f(x$Label); x})

# This line of code was causing problems, so I removed it
# data2$Slice <- f(data2$Slice)

#And added the following to apply it to data 2

names(data2) <- f(names(data2))
data2 <- lapply(data2, function(x) {x$Slice <- f(x$Slice); x})

相关问题