文章11 | 阅读 5333 | 点赞0
本次项目需要用到jsoup和fastjson,所以先在pom.xml中加入:
<dependencies>
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.11.3</version>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.49</version>
</dependency>
</dependencies>
项目需求:将个人的教学执行计划爬取出来,放在APP上进行展示,如下所示:
先登录内网:
登录成功后:
最后获得了内网的cookie:cookies_innet。
现在我们要模拟登录到新教务系统这个网页,进入到它的登录页面:
登录成功后获得cookie:cookies。详细过程见:JSoup模拟登录新版正方教务系统(内网-教务系统)爬取信息过程详解
下面是进入到查询界面的情形:
我们按下F12,选中计算机科学与技术之后,点击修读要求:
我们点击打开Network的第一个链接:
我们发现,需要四个参数,最下面的su是学号,gnmkdm是固定不变的,_参数是当前时间,而最上面的jxzxjhxx_id我一开始以为是固定的,但后来发现其实不是,让别人用自己的学号密码登录之后,查询教学执行计划还是我这个专业的计划,因此必须先确定jxzxjhxx_id,咋找呢?
根据这篇文章:Exception in thread “main“ org.jsoup.HttpStatusException: HTTP error fetching URL. Status=422, URL=猜测这个id可能就在原网页中,于是打开Elements搜索jxzxjhxx_id:
果然有,真是天助我也,于是乎先解析原网页:
String suburl = url + "/jwglxt/jxzxjhgl/jxzxjhck_cxJxzxjhckIndex.html?gnmkdm=N153540&layout=default&su=" + stuNum;
connection = Jsoup.connect(suburl);
connection.header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36");
connection.header("Connection", "keep-alive");
response = connection.cookies(cookies_innet).cookies(cookies).method(Connection.Method.GET).execute();
但是打印response.body()发现,没有我想要的id值,于是ctrl+u打开原网页:
我想要的id数据是在table标签内的,现在这个标签是空的,但看到上面的查询二字便恍然大悟,可能需要先点击查询按钮吧:
点击打开Network的第一个标签,看看需要提交哪些表单数据:
下面五个是设置查询后显示的,比如一页最多几个,当前第几页,是否排序等等,这个简单。第一个jg_id也很明显是学院编号,第二个njdm_id是年级,考虑到这个APP会被不同年级不同学院的同学使用,所以我一开始是不知道年级和学院编号的,也只能在原网页中找:
可以看到网页中存在这些值,value就是学院编号。接着查看被选中的年份:
于是找到被选中学院的编号和被选中的年份:
String suburl = url + "/jwglxt/jxzxjhgl/jxzxjhck_cxJxzxjhckIndex.html?gnmkdm=N153540&layout=default&su=" + stuNum;
connection = Jsoup.connect(suburl);
connection.header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36");
connection.header("Connection", "keep-alive");
response = connection.cookies(cookies_innet).cookies(cookies).method(Connection.Method.GET).execute();
String jg_id = "";
String njdm_id = "";
Document doc11 = Jsoup.parse(response.body());
//System.out.println(doc11);
//找学院和年份
Elements lis = doc11.getElementsByAttributeValue("id", "jg_id").select("option");
for(Element element : lis) {
if(element.attr("selected").equals("selected")) {
jg_id = element.attr("value");
System.out.println(jg_id);
}
}
Elements lis1 = doc11.getElementsByAttributeValue("id", "nj_cx").select("option");
for(Element element : lis1) {
if(element.attr("selected").equals("selected")) {
njdm_id = element.attr("value");
System.out.println(njdm_id);
}
}
我们先找到select标签下的option集合:
Elements lis = doc11.getElementsByAttributeValue("id", "jg_id").select("option");
接着依次遍历看哪一个option被选中了,这样最后就得到了想要的jg_id和njdm_id参数,于是开始模拟登录:
suburl = url + "/jwglxt/jxzxjhgl/jxzxjhck_cxJxzxjhckIndex.html?doType=query&gnmkdm=N153540&su=" + stuNum;
connection = Jsoup.connect(suburl);
connection.header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36");
connection.header("Content-Type","application/x-www-form-urlencoded;charset=utf-8");
connection.header("Connection", "keep-alive");
connection.data("jg_id", jg_id);
connection.data("njdm_id", njdm_id);
connection.data("dlbs", "");
connection.data("zyh_id", "");
connection.data("_search", "false");
connection.data("nd", String.valueOf(new Date().getTime()));
connection.data("queryModel.showCount", "15");
connection.data("queryModel.currentPage", "1");
connection.data("queryModel.sortName", "");
connection.data("queryModel.sortOrder", "asc");
connection.data("time", "1");
response = connection.cookies(cookies_innet).cookies(cookies).ignoreContentType(true).method(Connection.Method.GET).execute();
System.out.println(response.body());
打印出来再转json格式:
System.out.println(response.body());
JSONObject jsonObject = JSON.parseObject(response.body());
JSONArray table = JSON.parseArray(jsonObject.getString("items"));
打印table:
我们想要的jxzxjhxx_id确实在里面,接着根据专业提取相应的id:
for (Iterator iterator = table.iterator(); iterator.hasNext();) {
JSONObject lesson = (JSONObject) iterator.next();
if(lesson.getString("zymc").equals(major)) {
final_id = lesson.getString("jxzxjhxx_id");
}
System.out.println(lesson.getString("zymc") + " " +
lesson.getString("jxzxjhxx_id"));
}
final_id就是最后要找的jxzxjhxx_id。
找到id后回到一开始,我们要进入到这个界面:
String time = String.valueOf(new Date().getTime());
connection = Jsoup.connect(url + "/jwglxt/jxzxjhgl/jxzxjhck_cxJxzxjhxdyqIndex.html?jxzxjhxx_id=" + final_id + "&_=" + time + "&gnmkdm=N153540&su=" + stuNum);
connection.header("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0");
response = connection.cookies(cookies_innet).cookies(cookies).method(Connection.Method.GET).ignoreContentType(true).execute();
String doc = response.body();
打印doc之后发现网页中只是存在最低要求学分和已修学分等信息:
那就先找到最低要求学分和课程总学分这两个值,也就是解析doc,这里不再叙述。
点击课程详情新出来一个链接:
点击打开:
Query String Parameters是固定的,就在url里面,不再细说。主要是Form Data里面的xfyqjd_id值,因为必修专选实践有三个不同的id,跟上面一样,也需要在网页中找到:
System.out.println(doc);
int index11 = doc.indexOf("必修课 最低要求学分");
String sub1 = doc.substring(index11 - 200, index11 - 100);
int index12 = sub1.indexOf("xfyqjd_id");
String zhu = sub1.substring(index12 + 11, index12 + 43);
System.out.println(sub1.substring(index12 + 11, index12 + 43));
int index21 = doc.indexOf("专选课 最低要求学分");
String sub2 = doc.substring(index21 - 200, index21 - 100);
int index22 = sub2.indexOf("xfyqjd_id");
String zhuan = sub2.substring(index22 + 11, index22 + 43);
System.out.println(sub2.substring(index22 + 11, index22 + 43));
int index31 = doc.indexOf("实践课 最低要求学分");
String sub3 = doc.substring(index31 - 200, index31 - 100);
int index32 = sub3.indexOf("xfyqjd_id");
String shi = sub3.substring(index32 + 11, index32 + 43);
System.out.println(sub3.substring(index32 + 11, index32 + 43));
这里就不再解析网页了,因为这玩意是动态加载的。。。doc中没有id值,但是response.body()中有,于是就直接搜索了。
接下来就是查找所有课程了:
List<Plan> data = new ArrayList<>();
connection = Jsoup.connect(url + "/jwglxt/jxzxjhgl/jxzxjhxfyq_cxJxzxjhxfyqKcxx.html?gnmkdm=N153540&su=" + stuNum);
connection.header("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0");
connection.data("xfyqjd_id", zhu);
connection.data("jdkcsx", "1");
response = connection.cookies(cookies_innet).cookies(cookies).method(Connection.Method.POST).ignoreContentType(true).execute();
JSONArray major1 = JSON.parseArray(response.body());
Plan plan1 = new Plan();
plan1.setTag("必修课");
plan1.setMinCredit(credits.get(0));
plan1.setCurrentCredit(credits.get(3));
List<SubPlan> subPlans1 = new ArrayList<>();
for (Iterator iterator = major1.iterator(); iterator.hasNext();) {
JSONObject lesson = (JSONObject) iterator.next();
SubPlan subPlan = new SubPlan();
subPlan.setCourse_num(lesson.getString("KCH"));
subPlan.setCourse_name(lesson.getString("KCMC"));
subPlan.setCourse_nature(lesson.getString("KCXZMC"));
subPlan.setCredit(lesson.getString("XF"));
subPlan.setYear(lesson.getString("JYXDXNM"));
subPlan.setSemester(lesson.getString("JYXDXQM"));
subPlans1.add(subPlan);
System.out.println(lesson.getString("KCH") + " " +
lesson.getString("KCMC") + " " +
lesson.getString("KCXZMC") + " " +
lesson.getString("XF") + " " +
lesson.getString("JYXDXNM") + " " +
lesson.getString("JYXDXQM"));
}
plan1.setPlans(subPlans1);
data.add(plan1);
connection = Jsoup.connect(url + "/jwglxt/jxzxjhgl/jxzxjhxfyq_cxJxzxjhxfyqKcxx.html?gnmkdm=N153540&su=" + stuNum);
connection.header("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0");
connection.data("xfyqjd_id", zhuan);
connection.data("jdkcsx", "1");
response = connection.cookies(cookies_innet).cookies(cookies).method(Connection.Method.POST).ignoreContentType(true).execute();
//Document document = response.parse();
JSONArray major2 = JSON.parseArray(response.body());
Plan plan2 = new Plan();
plan2.setTag("专选课");
plan2.setMinCredit(credits.get(1));
plan2.setCurrentCredit(credits.get(4));
List<SubPlan> subPlans2 = new ArrayList<>();
for (Iterator iterator = major2.iterator(); iterator.hasNext();) {
JSONObject lesson = (JSONObject) iterator.next();
SubPlan subPlan = new SubPlan();
subPlan.setCourse_num(lesson.getString("KCH"));
subPlan.setCourse_name(lesson.getString("KCMC"));
subPlan.setCourse_nature(lesson.getString("KCXZMC"));
subPlan.setCredit(lesson.getString("XF"));
subPlan.setYear(lesson.getString("JYXDXNM"));
subPlan.setSemester(lesson.getString("JYXDXQM"));
subPlans2.add(subPlan);
}
plan2.setPlans(subPlans2);
data.add(plan2);
connection = Jsoup.connect(url + "/jwglxt/jxzxjhgl/jxzxjhxfyq_cxJxzxjhxfyqKcxx.html?gnmkdm=N153540&su=" + stuNum);
connection.header("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0");
connection.data("xfyqjd_id", shi);
connection.data("jdkcsx", "1");
response = connection.cookies(cookies_innet).cookies(cookies).method(Connection.Method.POST).ignoreContentType(true).execute();
//Document document = response.parse();
JSONArray major3 = JSON.parseArray(response.body());
Plan plan3 = new Plan();
plan3.setTag("实践课");
plan3.setMinCredit(credits.get(2));
plan3.setCurrentCredit(credits.get(5));
List<SubPlan> subPlans3 = new ArrayList<>();
for (Iterator iterator = major3.iterator(); iterator.hasNext();) {
JSONObject lesson = (JSONObject) iterator.next();
SubPlan subPlan = new SubPlan();
subPlan.setCourse_num(lesson.getString("KCH"));
subPlan.setCourse_name(lesson.getString("KCMC"));
subPlan.setCourse_nature(lesson.getString("KCXZMC"));
subPlan.setCredit(lesson.getString("XF"));
subPlan.setYear(lesson.getString("JYXDXNM"));
subPlan.setSemester(lesson.getString("JYXDXQM"));
subPlans3.add(subPlan);
}
plan3.setPlans(subPlans3);
最终结果:
版权说明 : 本文为转载文章, 版权归原作者所有 版权申明
原文链接 : https://blog.csdn.net/Cyril_KI/article/details/113678340
内容来源于网络,如有侵权,请联系作者删除!