csv 使用tabula-py为什么我得到一个列表而不是一个DataFrame？

2ic8powd 于 2023-10-13 发布在其他

关注(0)|答案(2)|浏览(100)

Output
我想使用PDF文件，特别是表格。我把这个编码

import pandas as pd
import numpy as np
import tabula
from tabula import read_pdf
tab= tabula.read_pdf('..\PDFs\Ala.pdf',encoding='latin-1', pages ='all')
tab

但是我得到了一个值列表，像这样：

[    Nombres  Edad Ciudad
0    Noelia    20   Lima
1  Michelie    45   Lima
2    Ximena    18   Lima
3    Miguel    43   Lima]

我无法分析它，因为它不是一个数据框。这只是一个例子，真实的PDF文件包含文本和几个页面之间的表格
请问有没有人可以帮我解决这个问题？

csv

来源：https://stackoverflow.com/questions/66037880/using-tabula-py-why-i-get-a-list-and-not-a-dataframe

2条答案

按热度按时间

lh80um4z1#

tabula应该返回一个Pandas字符串列表，每个字符串对应PDF中的一个表。您可以显示（并使用它们）如下：

import pandas as pd
import numpy as np
import tabula
from tabula import read_pdf
dfs = tabula.read_pdf('..\PDFs\Ala.pdf', encoding='latin-1', pages='all')
print(f"Found {len(dfs)} tables")
# display each of the dataframes
for df in dfs:
    print(df.size)
    print(df)

赞(0）回复(0）举报 2023-10-13

wqnecbli2#

tabula返回Pandas DataFrame列表。但是我们可以使用下面的语句将这个列表转换为Pandas DataFrame。

import tabula
import pandas
tab = pandas.DataFrame(tabula.read_pdf('..\PDFs\Ala.pdf', pages ='all')[0])

赞(0）回复(0）举报 2023-10-13

我来回答

csv 使用tabula-py为什么我得到一个列表而不是一个DataFrame？

2条答案

相关问题

热门标签

最新问答