在我的大学网页上,我们通过输入姓名和学生ID来检索我们的学期结果。我现在正在学习网络抓取项目,Scrapy或BeatifulSoup是否提供了一个解决方案,例如一次检索100个结果?您可以在这里查看它的内容:查看源:http://app1.helwan.edu.eg/Commerce/HasasnUpMlist.asp它使用如下代码:
<html>
<head>
<meta http-equiv="Content-Language" content="ar-eg">
<title></title>
<link href="natiga.css" rel="stylesheet" type="text/css" />
<meta http-equiv="Content-Type" content="text/html; charset=windows-1256" />
<meta name="generator" content="Hassan_kandeell@yahoo.com" />
</head>
<body>
<script type="text/javascript">
<!--
var EW_DATE_SEPARATOR; // Default date separator
EW_DATE_SEPARATOR = "/";
if (EW_DATE_SEPARATOR == '') EW_DATE_SEPARATOR = '/';
EW_UPLOAD_ALLOWED_FILE_EXT = "gif,jpg,jpeg,bmp,png,doc,xls,pdf,zip"; // Allowed upload file extension
var EW_FIELD_SEP = ', '; // Default field separator
// Ajax settings
EW_LOOKUP_FILE_NAME = "ewlookup61.asp"; // lookup file name
EW_ADD_OPTION_FILE_NAME = "ewaddopt61.asp"; // add option file name
// Auto suggest settings
var EW_AST_SELECT_LIST_ITEM = 0;
var EW_AST_TEXT_BOX_ID;
var EW_AST_CANCEL_SUBMIT;
var EW_AST_OLD_TEXT_BOX_VALUE = "";
var EW_AST_MAX_NEW_VALUE_LENGTH = 5; // Only get data if value length <= this setting
// Multipage settings
var ew_PageIndex = 0;
var ew_MaxPageIndex = 0;
var ew_MinPageIndex = 0;
var EW_TABLE_CLASSNAME = "ewTable"; // Note: changed the class name as needed
var ew_MultiPageElements = new Array();
//-->
</script>
<script type="text/javascript" src="ew61.js"></script>
<script type="text/javascript" src="userfn61.js"></script>
<script language="JavaScript" type="text/javascript">
<!--
// Write your client script here, no need to add script tags.
// To include another .js script, use:
// ew_ClientScriptInclude("my_javascript.js");
//-->
</script>
<div align="center">
<table border="0" width="1001" dir="rtl">
<tr>
<td width="995" colspan="2">
<p align="center">
<img border="0" src="Start.JPG" width="995" height="198"></td>
</tr>
<tr>
<td bgcolor="#AC8601" width="737">
<p align="center"> </td>
<td bgcolor="#800000" width="254">
<p align="center"><font size="5" color="#FFFFFF"><b>نتائج كلية
التجارة وإدارة الأعمال</b></font></td>
</tr>
</table>
</div>
<script type="text/javascript">
<!--
var EW_PAGE_ID = "list"; // Page id
//-->
</script>
<script type="text/javascript">
<!--
function ew_ValidateForm2(fobj) {
var infix = "";
for (var i=0;i<fobj.elements.length;i++) {
var elem = fobj.elements[i];
if (elem.name.substring(0,2) == "s_" || elem.name.substring(0,3) == "sv_")
elem.value = "";
}
return true;
}
//-->
我只是出于教育的目的,我想为我的同事做一个项目,因为网站的流量非常高,甚至需要几个小时才能得到一个结果。谢谢。
1条答案
按热度按时间tgabmvqs1#
当然,如果所有的记录在同一个页面上都是可见的,你可以使用javascript、scrapy、BeautifulSoup等一次性删除所有的结果。
如果网页通过分页显示结果,则应访问所有页面并相应地删除。
希望这对你有帮助。