我希望输出没有HTML标记。以下是python代码
import bs4
from bs4 import BeautifulSoup
import requests
import re
from lxml.html import fromstring
import xml.etree.ElementTree as ET
sauce = requests.get('https://www.cisco.com/c/en/us/support/optical-networking/network-convergence-system-1000-series/products-installation-and-configuration-guides-list.html').text
soup = BeautifulSoup(sauce, "html.parser")
i = 0
definition = soup.select('div.heading')
while i < len(definition):
print ( definition[I])
i += 1
电流输出
<div class="heading">Cisco Network Convergence System 1004</div>
<div class="heading">Cisco Network Convergence System 1001</div>
<div class="heading">Cisco Network Convergence System 1002</div>
<div class="heading">Cisco Network Convergence System 1010</div>
1条答案
按热度按时间8dtrkrch1#
通过访问循环中每个div元素的text属性,可以从输出中删除HTML标记,如下所示:
你可以使用for循环来简化它。