regex 从文件名中提取字符串并添加到R中合并 Dataframe 的新列中

wwtsj6pe  于 2022-11-18  发布在  其他
关注(0)|答案(1)|浏览(130)

我想导入一个目录(ROOT.DIR)中的所有.txt文件,并合并它们,同时添加一个包含文件名字符串中的日期/时间戳的新列。这将允许我唯一地标识每个观察。

使用macOS Monterey 12.6和R Studio 2022.07.1 Build 554的.txt文件中带制表符分隔符的MWE示例数据

第一个
这将返回一个错误

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file '/Users/Rob/Desktop/folder2022-10-17_07h52m13/_AA1111_ABCD_LIBRARY_[0-9]{3}.txt': No such file or directory
Called from: file(file, "rt")
Browse[1]>

filesdates对象的创建工作正常。
我认为错误是在语法中。对于do.call中的data = read.delim(paste0(ROOT.DIR, d, "/_AA1111_ABCD_LIBRARY_", "[0-9]{3}", ".txt"))。我已经尝试更改为data = read.delim(paste0(ROOT.DIR, "/", d, "/_AA1111_ABCD_LIBRARY_", "[0-9]{3}", ".txt")),但仍然出现错误。

Error during wrapup: unused argument (pattern = "/Users/Rob/Desktop/folder/2022-10-17_07h50m10_AA1111_ABCD_LIBRARY_001.txt")
Error: no more error handlers available (recursive errors?); invoking 'abort' restart

我尝试过使用regex来定位和导入目录中的每个文件,文件名的格式在开始概述,但我是新的regex,并希望在这里的任何建议。
或者,也许我需要在这里使用gsub

wrrgggsh

wrrgggsh1#

在上面的示例中,我试图将regex字符串作为文件名输入read.delim,但read.delim无法接受
我能够使用下面的代码实现所需的输出。

setwd("path/to/directory/contining/files") # Define the folder where the raw data files are to be read from
root.dir <- getwd() # Assign that folder as the `root.dir` object

import_and_merge <- function(root.dir){
    
  files <- fs::dir_ls(root.dir, regexp = ".txt$") # loads all the .txt files in from the `root.dir` folder
  data <- files %>%  # Create a data frame called `data`
  
  purrr::map_dfr(read_delim, .id = "Sample") %>% # imports all the files identified above into the `data` data frame and adds a column which includes the .txt filename for each row of data imported
  
  mutate(Sample = str_extract(Sample, "([0-9]{4}-[0-9]{2}-[0-9]{2}_[0-9]{2}[a-z]{1}[0-9]{2}[a-z]{1}[0-9]{2})")) # extracts the date and time stamp from the full .txt file name row
  
  now <- parsedate::format_iso_8601(Sys.time()) #  Create object called `now`, which contains a specific date and time stamp, accurate to 1 second ins ISO8601 format

  filename <- paste0("merged-data_",now,".csv") # Define the name for the merged file to be written out

  out.dir <- "/path/to/directory/to/contain/outputs" # Define the folder where you want to merged file to be exported to. This should not be the same folder as the raw data

  write.csv(data, paste0(out.dir, filename), row.names = FALSE) # Export the merged data as a .csv file into the defined out folder.

}

### Now the function is loaded, I can run it as below ###

import_and_merge(root.dir) # Run this function with `root.dir`.

相关问题