我怎么才能避免一张table从Pandas到美味的汤?

wqlqzqxt  于 2023-03-28  发布在  其他
关注(0)|答案(2)|浏览(141)

我尝试插入一个从pandas到beautiful soup的表并将其写入一个html文件。但是html文件中的表有转义字符而不是< and >。
如何将我的pandas dataframe重写为htmfile?
在脚本“003readWriteHtml.py”中,我从“html-in.html”读取一个表到pandas数据框中,我向数据框添加一行,然后想将表保存到同一个html文件中。我得到的输出如“html-out.html”所示
有人能帮我得到正确的输出吗?

***003readWriteHtml.py

# lees een specifieke tabel uit een html bestand naar een array.
# schrijf de array, met veranderingen weer terug naar het html bestand 

from bs4 import BeautifulSoup
import pandas as pd
import html
infile = "html-in.html"
outfile = "html-out.html"
table_class = "bpm-data"

print("==========================================================")

fp = open(infile)
soup = BeautifulSoup(fp, 'html.parser')
fp.close()

spDiv = soup.find("div", class_ = table_class)
spTbl = spDiv.table
pdTbl = pd.read_html(str(spTbl))[0]

pdnFr = pd.DataFrame({'naam':['Bas'], 'geboortedatum':['18760522'],'mobiel':['+31 6 2345 3456'],'email':['bas@gmail.com']})

pdTbl = pd.concat([pdTbl, pdnFr])
print("\n--- pdTbl ---\n",pdTbl)

# print("--- write table back to html ------------------------------\n\n")
tag = pdTbl.to_html(index=False, justify='left')
print("\n--- tag ---\n",tag)

tbl = soup.find("div", class_=table_class)
tbl.replaceWith(tag)
print("\n--- soup ---\n",soup)

# print("--- write html back to file ------------------------------\n\n")
fp = open(outfile, "w")
fp.write(soup.prettify())
fp.close

***html-in.html

<!DOCTYPE html>
<html lang="en-us">
<head>
  <title>html-db</title>
  <meta name="author" content="Bas Mooijman">
  <meta http-equiv="content-type" content="text/html; charset=UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <link rel="stylesheet" href="../../../../wiki/wiki.css">
</head>
<body>
  <h1>html-db</h1>
  <hr>

  <div  class="bpm-data">
    <table  class="bpm-data">
      <tr class="header">
        <th>naam</th>
        <th>geboortedatum</th>
        <th>mobiel</th>
        <th>email</th>
      </tr>
      <tr>
        <td>Marianne</td>
        <td>18220818</td>
        <td>+31 6 1234 2345</td>
        <td>marianne@gmail.com</td>
      </tr>
      <tr>
        <td>Tim</td>
        <td>20290307</td>
        <td>+31 6 1234 3567</td>
        <td>tim@gmail.com</td>
      </tr>
      <tr>
        <td>Huub</td>
        <td>20890909</td>
        <td>+31 6 1234 4456</td>
        <td>huub@gmail.com</td>
      </tr>
    </table>
  </div>
  
  <hr>
</body>
</html>

***html-out.html

<!DOCTYPE html>
<html lang="en-us">
 <head>
  <title>
   html-db
  </title>
  <meta content="Bas Mooijman" name="author"/>
  <meta content="text/html; charset=utf-8" http-equiv="content-type"/>
  <meta content="width=device-width, initial-scale=1.0" name="viewport"/>
  <link href="../../../../wiki/wiki.css" rel="stylesheet"/>
 </head>
 <body>
  <h1>
   html-db
  </h1>
  <hr/>
  &lt;table border="1" class="dataframe"&gt;
  &lt;thead&gt;
    &lt;tr style="text-align: left;"&gt;
      &lt;th&gt;naam&lt;/th&gt;
      &lt;th&gt;geboortedatum&lt;/th&gt;
      &lt;th&gt;mobiel&lt;/th&gt;
      &lt;th&gt;email&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Marianne&lt;/td&gt;
      &lt;td&gt;18220818&lt;/td&gt;
      &lt;td&gt;+31 6 1234 2345&lt;/td&gt;
      &lt;td&gt;marianne@gmail.com&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Tim&lt;/td&gt;
      &lt;td&gt;20290307&lt;/td&gt;
      &lt;td&gt;+31 6 1234 3567&lt;/td&gt;
      &lt;td&gt;tim@gmail.com&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Huub&lt;/td&gt;
      &lt;td&gt;20890909&lt;/td&gt;
      &lt;td&gt;+31 6 1234 4456&lt;/td&gt;
      &lt;td&gt;huub@gmail.com&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Bas&lt;/td&gt;
      &lt;td&gt;18760522&lt;/td&gt;
      &lt;td&gt;+31 6 2345 3456&lt;/td&gt;
      &lt;td&gt;bas@gmail.com&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
  <hr/>
 </body>
</html>
wrrgggsh

wrrgggsh1#

从HTML字符串创建BeautifulSoup的另一个示例,将表替换为:

tbl.replaceWith(BeautifulSoup(tag, 'html.parser'))
wvmv3b1j

wvmv3b1j2#

更改:

...
tbl.replaceWith(tag)
...

致:

...
tbl.replaceWith(BeautifulSoup(tag, 'html.parser'))
...

结果变为:

<!DOCTYPE html>
<html lang="en-us">
 <head>
  <title>
   html-db
  </title>
  <meta content="Bas Mooijman" name="author"/>
  <meta content="text/html; charset=utf-8" http-equiv="content-type"/>
  <meta content="width=device-width, initial-scale=1.0" name="viewport"/>
  <link href="../../../../wiki/wiki.css" rel="stylesheet"/>
 </head>
 <body>
  <h1>
   html-db
  </h1>
  <hr/>
  <table border="1" class="dataframe">
   <thead>
    <tr style="text-align: left;">
     <th>
      naam
     </th>
     <th>
      geboortedatum
     </th>
     <th>
      mobiel
     </th>
     <th>
      email
     </th>
    </tr>
   </thead>
   <tbody>
    <tr>
     <td>
      Marianne
     </td>
     <td>
      18220818
     </td>
     <td>
      +31 6 1234 2345
     </td>
     <td>
      marianne@gmail.com
     </td>
    </tr>
    <tr>
     <td>
      Tim
     </td>
     <td>
      20290307
     </td>
     <td>
      +31 6 1234 3567
     </td>
     <td>
      tim@gmail.com
     </td>
    </tr>
    <tr>
     <td>
      Huub
     </td>
     <td>
      20890909
     </td>
     <td>
      +31 6 1234 4456
     </td>
     <td>
      huub@gmail.com
     </td>
    </tr>
    <tr>
     <td>
      Bas
     </td>
     <td>
      18760522
     </td>
     <td>
      +31 6 2345 3456
     </td>
     <td>
      bas@gmail.com
     </td>
    </tr>
   </tbody>
  </table>
  <hr/>
 </body>
</html>

相关问题