jsoup的url格式不正确

yiytaume  于 2021-06-26  发布在  Java
关注(0)|答案(0)|浏览(244)

我在使用jsoup时遇到了一个问题,因为它给了我一个格式错误的url错误。如果我将url硬编码到程序中,它可以正常工作,但是如果我将csv文件读入一个列表<string[]>中,然后循环列表中的每个值,它就会失败。例如,如果我硬编码http://www.clubmark.org.uk/ 在程序中,它可以正常工作,但如果我从csv读取到列表<string[]>中,它就会失败。
堆栈跟踪无效

Exception in thread "restartedMain" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.boot.devtools.restart.RestartLauncher.run(RestartLauncher.java:49)
Caused by: java.lang.IllegalArgumentException: Malformed URL: http://www.clubmark.org.uk/
    at org.jsoup.helper.HttpConnection.url(HttpConnection.java:131)
    at org.jsoup.helper.HttpConnection.connect(HttpConnection.java:70)
    at org.jsoup.Jsoup.connect(Jsoup.java:73)
    at com.domainModel.DownloadImages.findImages(DownloadImages.java:43)
    at com.workingprojects.WebScraperApplication.main(WebScraperApplication.java:40)

我的主要课程是

@SpringBootApplication
@EntityScan({"com.bootstrap","com.domainModel"})
@ComponentScan({"com.bootstrap","com.domainModel"})
public class WebScraperApplication {

    public static void main(String[] args) throws IOException, CsvException {
        SpringApplication.run(WebScraperApplication.class, args);

        DownloadImages downloadImages = new DownloadImages();

        ReadCSV readCSV = new ReadCSV();
        ArrayList<String[]> urls = (ArrayList<String[]>) readCSV.csvReader("C:\\link1.csv");

        for (int i = 0; i < 1; i++) {     
            String[] thisURLObject = urls.get(0);
            String thisURL =thisURLObject[0];
            String status = downloadImages.findImages(thisURL, "C:\\Users\\xxx\\images");
            System.out.println(thisURL + status);

            }

;
        System.out.println("finished");

    }

}

我的课是在哪里得到图像和问题是看到的

package com.domainModel;

import org.jsoup.Jsoup;

public class DownloadImages {

     //The url of the website.
    @Getter @Setter
    private String webSiteURL;

//The path of the folder that you want to save the images to
@Getter @Setter
private  String folderPath;

public String findImages(String webSiteURL, String folderPath ) {

    try {

        //Connect to the website and get the html
        Document doc = Jsoup.connect(webSiteURL).get();

        //Get all elements with img tag ,
        Elements img = doc.getElementsByTag("img");
       System.out.println("Images is" + img.size());

       String folderNameWk2 = webSiteURL.replace(".html", "");
       String folderNameWk3 = folderNameWk2.replace("http://", "");

       Path path = Paths.get(folderPath + folderNameWk3);
       Files.createDirectories(path);
       String path1 = path.toString();
       System.out.println("The path is " + path1);

       int counter = 0;

        for (Element el : img) {

            String docName = String.valueOf(counter)+".jpeg";

            //for each element get the srs url
            String src = el.absUrl("src");

            System.out.println("Image Found!");
            System.out.println("src attribute is : "+src);
            getImages(src, path1, docName);

            counter = counter+1;

        }

    } catch (IOException ex) {

        System.err.println("There was an error");
        System.out.println(ex);
    //    Logger.getLogger(DownloadImages.class.getName()).log(Level.SEVERE, null, ex);
    }

    return "complete";
}

    private void getImages(String src, String folderPath, String docName) throws IOException {

     //   String folder = null;

        //Exctract the name of the image from the src attribute
        int indexname = src.lastIndexOf("/");

        if (indexname == src.length()) {
            src = src.substring(1, indexname);
        }

        indexname = src.lastIndexOf("/");
        String name = src.substring(indexname, src.length());

        System.out.println(name);

        //Open a URL Stream
        URL url = new URL(src);
        InputStream in = url.openStream();

        OutputStream out = new BufferedOutputStream(new FileOutputStream(folderPath+"/" + docName));

        for (int b; (b = in.read()) != -1;) {
            out.write(b);
        }
        out.close();
        in.close();

    }

    /**
     * @param webSiteURL
     * @param folderPath
     */
    public DownloadImages(String webSiteURL, String folderPath) {
        super();
        this.webSiteURL = webSiteURL;
        this.folderPath = folderPath;
    }

    /**
     * 
     */
    public DownloadImages() {
        super();
    }

}

And the class which gets the CSV file is 

    package com.domainModel;

public class ReadCSV {

    public List<String[]> csvReader(String fileName) throws IOException, CsvException{

        try (CSVReader reader = new CSVReader(new FileReader(fileName))) {
            List<String[]> r = reader.readAll();

            return r;

}
}
}

我的班级在csv中阅读

public class ReadCSV {

    public List<String[]> csvReader(String fileName) throws IOException, CsvException{

        try (CSVReader reader = new CSVReader(new FileReader(fileName))) {
            List<String[]> r = reader.readAll();

            return r;

}
}
}

我可以合理地确定问题出在我从列表中传递的内容的格式上,但是当我查看这些值时,它们看起来肯定是字符串
csv文件的前两行
http://www.clubmark.org.uk/, http://www.designit-uk.com/,
记事本中前两行数据的图像

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题