java-read-page-sourcefrom-url返回未知字符

x8diyxa7  于 2021-07-12  发布在  Java
关注(0)|答案(1)|浏览(437)

我正在使用下面的代码从url读取一个页面源代码(https://www.amazon.com)在netbeans中使用“utf-8”字符集,但它返回未知字符(附带的图像)。我不知道是什么问题,如果你能帮我修改代码使之正常工作,我将不胜感激?谢谢。

  1. public static String getURLSource(String url) throws IOException
  2. {
  3. URL urlObject = new URL(url);
  4. URLConnection urlConnection = urlObject.openConnection();
  5. urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
  6. return toString(urlConnection.getInputStream());
  7. }
  8. private static String toString(InputStream inputStream) throws IOException
  9. {
  10. try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream, "UTF-8")))
  11. {
  12. String inputLine;
  13. StringBuilder stringBuilder = new StringBuilder();
  14. while ((inputLine = bufferedReader.readLine()) != null)
  15. {
  16. stringBuilder.append(inputLine);
  17. }
  18. return stringBuilder.toString();
  19. }
  20. }
jvlzgdj9

jvlzgdj91#

使用 HttpsUrlConnection 而不是 UrlConnection . 请看一个类似的问题。

相关问题