作为实习的一部分,我正在尝试在不使用lxml的情况下使用python对网站进行数据垃圾处理,我的代码如下所示:
from urllib.request import urlopen
import requests
# Start the session
session = requests.Session()
# Create the payload
payload = {'email':'*********',
'password':'********'
}
# Post the payload to the site to log in
s = session.post("********************", data=payload)
url_to_scrape = "*******************"
page = urlopen(url_to_scrape)
print("printing page")
print(page)
html_bytes = page.read()
html = html_bytes.decode("utf-8")
print("printing html")
print(html)
这将返回一些网站信息,但最重要的是,没有从主要部分返回任何信息,我从主要部分收到的信息如下:
<!-- Add your site or application content here -->
<main id="wrap" ui-view=""></main>
但我期待的是:
<main id="wrap" ui-view="" class="ng-scope"><nav class="navbar navbar-inverse ng-scope" ng-class="navColorClass" ng-controller="MainMenuController" scroll-to-top="">
<div class="container-fluid">
<!-- Brand and toggle get grouped for better mobile display -->
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#servicedesk-navbar" aria-expanded="false" ng-init="navCollapsed = true" ng-click="navCollapsed = !navCollapsed">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
...
...
...
...
如何获得预期的输出?
暂无答案!
目前还没有任何答案,快来回答吧!