R语言 换行符从docx文件中消失

pod7payv  于 2023-04-27  发布在  其他
关注(0)|答案(1)|浏览(131)

当使用read_docx阅读docx文件时,docx文件中段落内的换行符(软返回)不会被读取,即消失。是否可以读取doxc并保留换行符?

2jcobegt

2jcobegt1#

我已经能够读取带有换行符的word文件的内容:

library(RDCOMClient)

wordApp <- COMCreate("Word.Application")
wordApp[["Visible"]] <- TRUE
wordApp[["DisplayAlerts"]] <- FALSE
path_To_Word_File <- "D:\\text.docx"
doc <- wordApp[["Documents"]]$Open(normalizePath(path_To_Word_File), ConfirmConversions = FALSE)
doc_Selection <-  wordApp$Selection()

list_Text <- list()

for(i in 1 : 40)
{
  print(i)
  error_Term <- tryCatch(wordApp[["ActiveDocument"]]$ActiveWindow()$Panes(1)$Pages(1)$Rectangles(i)$Range()$Select(),
                         error = function(e) NA)
  
  list_Text[[i]] <- tryCatch(doc_Selection$Range()$Text(), error = function(e) NA)
  
  if(!is.null(error_Term))
  {
    break
  }
}

list_Text

[[1]]
[1] "hi\r"

[[2]]
[1] "\r"

[[3]]
[1] "this is a good text\r"

[[4]]
[1] "\r"

[[5]]
[1] "\r"

[[6]]
[1] "\r"

[[7]]
[1] "here is a word document\r"

[[8]]
[1] "here is a word document\r"

相关问题