我尝试插入一个从pandas到beautiful soup的表并将其写入一个html文件。但是html文件中的表有转义字符而不是< and >。
如何将我的pandas dataframe重写为htmfile?
在脚本“003readWriteHtml.py”中,我从“html-in.html”读取一个表到pandas数据框中,我向数据框添加一行,然后想将表保存到同一个html文件中。我得到的输出如“html-out.html”所示
有人能帮我得到正确的输出吗?
***003readWriteHtml.py
# lees een specifieke tabel uit een html bestand naar een array.
# schrijf de array, met veranderingen weer terug naar het html bestand
from bs4 import BeautifulSoup
import pandas as pd
import html
infile = "html-in.html"
outfile = "html-out.html"
table_class = "bpm-data"
print("==========================================================")
fp = open(infile)
soup = BeautifulSoup(fp, 'html.parser')
fp.close()
spDiv = soup.find("div", class_ = table_class)
spTbl = spDiv.table
pdTbl = pd.read_html(str(spTbl))[0]
pdnFr = pd.DataFrame({'naam':['Bas'], 'geboortedatum':['18760522'],'mobiel':['+31 6 2345 3456'],'email':['bas@gmail.com']})
pdTbl = pd.concat([pdTbl, pdnFr])
print("\n--- pdTbl ---\n",pdTbl)
# print("--- write table back to html ------------------------------\n\n")
tag = pdTbl.to_html(index=False, justify='left')
print("\n--- tag ---\n",tag)
tbl = soup.find("div", class_=table_class)
tbl.replaceWith(tag)
print("\n--- soup ---\n",soup)
# print("--- write html back to file ------------------------------\n\n")
fp = open(outfile, "w")
fp.write(soup.prettify())
fp.close
***html-in.html
<!DOCTYPE html>
<html lang="en-us">
<head>
<title>html-db</title>
<meta name="author" content="Bas Mooijman">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" href="../../../../wiki/wiki.css">
</head>
<body>
<h1>html-db</h1>
<hr>
<div class="bpm-data">
<table class="bpm-data">
<tr class="header">
<th>naam</th>
<th>geboortedatum</th>
<th>mobiel</th>
<th>email</th>
</tr>
<tr>
<td>Marianne</td>
<td>18220818</td>
<td>+31 6 1234 2345</td>
<td>marianne@gmail.com</td>
</tr>
<tr>
<td>Tim</td>
<td>20290307</td>
<td>+31 6 1234 3567</td>
<td>tim@gmail.com</td>
</tr>
<tr>
<td>Huub</td>
<td>20890909</td>
<td>+31 6 1234 4456</td>
<td>huub@gmail.com</td>
</tr>
</table>
</div>
<hr>
</body>
</html>
***html-out.html
<!DOCTYPE html>
<html lang="en-us">
<head>
<title>
html-db
</title>
<meta content="Bas Mooijman" name="author"/>
<meta content="text/html; charset=utf-8" http-equiv="content-type"/>
<meta content="width=device-width, initial-scale=1.0" name="viewport"/>
<link href="../../../../wiki/wiki.css" rel="stylesheet"/>
</head>
<body>
<h1>
html-db
</h1>
<hr/>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: left;">
<th>naam</th>
<th>geboortedatum</th>
<th>mobiel</th>
<th>email</th>
</tr>
</thead>
<tbody>
<tr>
<td>Marianne</td>
<td>18220818</td>
<td>+31 6 1234 2345</td>
<td>marianne@gmail.com</td>
</tr>
<tr>
<td>Tim</td>
<td>20290307</td>
<td>+31 6 1234 3567</td>
<td>tim@gmail.com</td>
</tr>
<tr>
<td>Huub</td>
<td>20890909</td>
<td>+31 6 1234 4456</td>
<td>huub@gmail.com</td>
</tr>
<tr>
<td>Bas</td>
<td>18760522</td>
<td>+31 6 2345 3456</td>
<td>bas@gmail.com</td>
</tr>
</tbody>
</table>
<hr/>
</body>
</html>
2条答案
按热度按时间wrrgggsh1#
从HTML字符串创建
BeautifulSoup
的另一个示例,将表替换为:wvmv3b1j2#
更改:
致:
结果变为: