当使用read_docx阅读docx文件时,docx文件中段落内的换行符(软返回)不会被读取,即消失。是否可以读取doxc并保留换行符?
2jcobegt1#
我已经能够读取带有换行符的word文件的内容:
library(RDCOMClient) wordApp <- COMCreate("Word.Application") wordApp[["Visible"]] <- TRUE wordApp[["DisplayAlerts"]] <- FALSE path_To_Word_File <- "D:\\text.docx" doc <- wordApp[["Documents"]]$Open(normalizePath(path_To_Word_File), ConfirmConversions = FALSE) doc_Selection <- wordApp$Selection() list_Text <- list() for(i in 1 : 40) { print(i) error_Term <- tryCatch(wordApp[["ActiveDocument"]]$ActiveWindow()$Panes(1)$Pages(1)$Rectangles(i)$Range()$Select(), error = function(e) NA) list_Text[[i]] <- tryCatch(doc_Selection$Range()$Text(), error = function(e) NA) if(!is.null(error_Term)) { break } } list_Text [[1]] [1] "hi\r" [[2]] [1] "\r" [[3]] [1] "this is a good text\r" [[4]] [1] "\r" [[5]] [1] "\r" [[6]] [1] "\r" [[7]] [1] "here is a word document\r" [[8]] [1] "here is a word document\r"
1条答案
按热度按时间2jcobegt1#
我已经能够读取带有换行符的word文件的内容: