从R中的ppt(x)中提取表格

nc1teljy  于 2023-04-27  发布在  其他
关注(0)|答案(3)|浏览(118)

PPT或pptx中有表格,我想将它们提取为R中的 Dataframe 。有解决方案吗?谢谢。
备选方案:在R中将ppt(x)转换为pdf,并使用其他软件包提取它们。有没有可以将ppt转换为pdf的软件包?

odopli94

odopli941#

希望this能为你工作。但是,代码是用Python编写的。你可以很容易地修改为R。

prs = Presentation((path_to_presentation))
# text_runs will be populated with a list of strings,
# one for each text run in presentation
text_runs = []
for slide in prs.slides:
    for shape in slide.shapes:
        if not shape.has_table:
            continue    
        tbl = shape.table
        row_count = len(tbl.rows)
        col_count = len(tbl.columns)
        for r in range(0, row_count):
            for c in range(0, col_count):
                cell = tbl.cell(r,c)
                paragraphs = cell.text_frame.paragraphs 
                for paragraph in paragraphs:
                    for run in paragraph.runs:
                        text_runs.append(run.text)

print(text_runs)```
uqxowvwt

uqxowvwt2#

请尝试在CRAN上发布的软件包eoffice并使用inpptx函数:

totable(t.test(wt ~ am, mtcars), filename = file.path(tempdir(), "mtcars.pptx"))
## inpptx and indocx provide function read the tables in pptx or docx
tabs <- inpptx(filename = file.path(tempdir(), "mtcars.pptx"), header = TRUE)
kq4fsx7k

kq4fsx7k3#

要在R中将PowerPoint转换为PDF,您可以考虑以下方法:

library(RDCOMClient)
pptapp <- COMCreate("PowerPoint.Application") 
pptapp[["Visible"]] <- TRUE
pptpres <- pptapp$Presentations()$Open("D:\\ppt_With_Table.pptx")
pptpres$SaveAs("D:\\ppt_With_Table.pdf", FileFormat = 32)

要从PowerPoint中提取表格,可以考虑以下方法:

library(RDCOMClient)
pptapp <- COMCreate("PowerPoint.Application") 
pptapp[["Visible"]] <- TRUE
pptpres <- pptapp$Presentations()$Open("D:\\Dropbox\\Reponses_Stackoverflow\\stackoverflow_401\\ppt_With_Table.pptx")

mat_Table1 <- matrix(NA, nrow = 3, ncol = 3)

for(i in 1 : 3)
{
  for(j in 1 : 3)
  {
    mat_Table1[i,j] <- pptapp[["ActivePresentation"]]$Slides(1)$Shapes(1)$Table()$Cell(1,1)$Shape()$TextFrame()$TextRange()$Text()    
  }
}

相关问题