java爬虫知识盲区整理

x33g5p2x  于2021-12-30 转载在 Java  
字(4.5k)|赞(0)|评价(0)|浏览(259)

HttpClient重定向处理

【HttpClient4.5中文教程】八.终止请求和重定向处理

首先说说HttpClient和浏览器的区别

我们从浏览器发起一笔请求,浏览器则会帮你处理重定向、缓存等事情。这也就是为什么用浏览器表单post提交后,不管服务端如何重定向,都能正常接收到服务端返回的数据。

但是用HttpClient呢,你会发现,请求后,会返回302,因为POST方式提交HttpClient是不会帮你处理重定向的。这时候怎么办呢?

方法一:(自己手动处理)

HttpClient httpClient = HttpClients.createDefault();

        HttpPost httpPost= new HttpPost(http://ip:port/xxx);

        CloseableHttpResponse response = httpclient.execute(httpPost);

        int statusCode = response.getStatusLine().getStatusCode();
        System.out.println("statusCode=="+statusCode); //返回码

        Header header=response.getFirstHeader("Location");

        //重定向地址
        String location =  header.getValue();
        System.out.println(location);

        //然后再对新的location发起请求即可

        HttpGet httpGet = new HttpGet(location);
        CloseableHttpResponse response2 = httpclient.execute(httpGet);
        System.out.println("返回报文"+EntityUtils.toString(response2.getEntity(), "UT-F-8"));

方法二:(已有工具类)

HttpClientBuilder builder = HttpClients.custom()
            .disableAutomaticRetries() //关闭自动处理重定向
            .setRedirectStrategy(new LaxRedirectStrategy());//利用LaxRedirectStrategy处理POST重定向问题

       CloseableHttpClient client = builder.build();

        HttpPost httpPost= new HttpPost(http://ip:port/xxx);

        CloseableHttpResponse response = client.execute(httpPost);

        int statusCode = response.getStatusLine().getStatusCode();
        System.out.println("statusCode=="+statusCode); //返回码

         System.out.println("返回报文"+EntityUtils.toString(response.getEntity(), "UT-F-8"));

HttpClient获取Cookie的两种方式

一、旧版本的HttpClient获取Cookies

p.s. 该方式官方已不推荐使用

使用DefaultHttpClient类实例化httpClient对象:

public static String dooPost_deprecated(String url, Map<String, String> map, String charset) {
        DefaultHttpClient httpClient = null;
        HttpPost httpPost = null;
        String result = null;
        try {
            httpClient = new DefaultHttpClient();
            httpPost = new HttpPost(url);
            // 设置参数
            List<NameValuePair> list = new ArrayList<NameValuePair>();
            Iterator<Entry<String, String>> iterator = map.entrySet().iterator();
            while (iterator.hasNext()) {
                Entry<String, String> elem = (Entry<String, String>) iterator.next();
                list.add(new BasicNameValuePair(elem.getKey(), elem.getValue()));
            }
            if (list.size() > 0) {
                UrlEncodedFormEntity entity = new UrlEncodedFormEntity(list, charset);
                httpPost.setEntity(entity);
            }
            HttpResponse response = httpClient.execute(httpPost);
            System.out.println(response.getStatusLine().getStatusCode());
            String JSESSIONID = null;
            String cookie_user = null;
            //获得Cookies
            CookieStore cookieStore = httpClient.getCookieStore();
            List<Cookie> cookies = cookieStore.getCookies();
            for (int i = 0; i < cookies.size(); i++) {
                //遍历Cookies
                System.out.println(cookies.get(i));
                System.out.println("cookiename=="+cookies.get(i).getName());
                System.out.println("cookieValue=="+cookies.get(i).getValue());
                System.out.println("Domain=="+cookies.get(i).getDomain());
                System.out.println("Path=="+cookies.get(i).getPath());
                System.out.println("Version=="+cookies.get(i).getVersion());

                if (cookies.get(i).getName().equals("JSESSIONID")) {
                    JSESSIONID = cookies.get(i).getValue();
                }
                if (cookies.get(i).getName().equals("cookie_user")) {
                    cookie_user = cookies.get(i).getValue();
                }
            }
            if (cookie_user != null) {
                result = JSESSIONID;
            }
        } catch (Exception ex) {
            ex.printStackTrace();
        }
        return result;
    }

二、新版本的HttpClient获取Cookies

使用CloseableHttpClient类实例化httpClient对象:

public static String doPost(Map<String, String> map, String charset) {
        CloseableHttpClient httpClient = null;
        HttpPost httpPost = null;
        String result = null;
        try {
            CookieStore cookieStore = new BasicCookieStore();
            httpClient = HttpClients.custom().setDefaultCookieStore(cookieStore).build();
            httpPost = new HttpPost("http://localhost:8080/testtoolmanagement/LoginServlet");
            List<NameValuePair> list = new ArrayList<NameValuePair>();
            Iterator<Map.Entry<String, String>> iterator = map.entrySet().iterator();
            while (iterator.hasNext()) {
                Entry<String, String> elem = (Entry<String, String>) iterator.next();
                list.add(new BasicNameValuePair(elem.getKey(), elem.getValue()));
            }
            if (list.size() > 0) {
                UrlEncodedFormEntity entity = new UrlEncodedFormEntity(list, charset);
                httpPost.setEntity(entity);
            }
            httpClient.execute(httpPost);
            String JSESSIONID = null;
            String cookie_user = null;
            List<Cookie> cookies = cookieStore.getCookies();
            for (int i = 0; i < cookies.size(); i++) {
                if (cookies.get(i).getName().equals("JSESSIONID")) {
                    JSESSIONID = cookies.get(i).getValue();
                }
                if (cookies.get(i).getName().equals("cookie_user")) {
                    cookie_user = cookies.get(i).getValue();
                }
            }
            if (cookie_user != null) {
                result = JSESSIONID;
            }
        } catch (Exception ex) {
            ex.printStackTrace();
        }
        return result;
    }

根据关键字,Java抓取百度图片

根据关键字,Java抓取百度图片

Java网络爬虫(二)–HttpClient设置头部信息与模拟登录策略

Java网络爬虫(二)–HttpClient设置头部信息与模拟登录策略

相关文章

最新文章

更多