python 无法从输出中删除html标记

icomxhvb  于 2023-03-06  发布在  Python
关注(0)|答案(1)|浏览(121)

我希望输出没有HTML标记。以下是python代码

import bs4
from bs4 import BeautifulSoup
import requests
import re
from lxml.html import fromstring
import xml.etree.ElementTree as ET
sauce = requests.get('https://www.cisco.com/c/en/us/support/optical-networking/network-convergence-system-1000-series/products-installation-and-configuration-guides-list.html').text
soup = BeautifulSoup(sauce, "html.parser")
i = 0

definition = soup.select('div.heading')

while i < len(definition):

  print ( definition[I])
  i += 1

电流输出

<div class="heading">Cisco Network Convergence System 1004</div>
<div class="heading">Cisco Network Convergence System 1001</div>
<div class="heading">Cisco Network Convergence System 1002</div>
<div class="heading">Cisco Network Convergence System 1010</div>
8dtrkrch

8dtrkrch1#

通过访问循环中每个div元素的text属性,可以从输出中删除HTML标记,如下所示:

import bs4
from bs4 import BeautifulSoup
import requests
import re
from lxml.html import fromstring
import xml.etree.ElementTree as ET
sauce = requests.get('https://www.cisco.com/c/en/us/support/optical-networking/network-convergence-system-1000-series/products-installation-and-configuration-guides-list.html').text
soup = BeautifulSoup(sauce, "html.parser")
i = 0

definition = soup.select('div.heading')

while i < len(definition):

  print ( definition[i].text)
  i += 1

你可以使用for循环来简化它。

for definition in definitions:
    print(definition. Text)

相关问题