如何用nodejs解码html页面?

vlju58qv  于 2023-04-29  发布在  Node.js
关注(0)|答案(1)|浏览(91)

我正在发出get请求

function get() {
    var headers = {
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept-Language': 'ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7',
        'Cache-Control': 'max-age=0',
        'Connection': 'keep-alive',
        'Cookie': cookie,
        'Host': 'lms.hse.ru',
        'Upgrade-Insecure-Requests': '1',
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36',
        'X-Requested-With': 'XMLHttpRequest'
    }
    var options = {
        url: 'https://lms.hse.ru/?sl&tab=9548',
        method: 'GET',
        headers: headers
    }

    return new Promise(function (resolve, reject) {
        request(options, function (err, response, body) {
            console.log(body);
        });
    })
}

控制台输出如下所示:

Y�w��v�l�W[y���Ěn���H�����`�EU�@2�̌�lT��yk˱�V��,�� �=n�4��{5�a�Z��;���~Ȯ:F��������7K`G��ض��1�T�$*`a�bZ~���bhn�hٷEW�Sݞ��x�c�p�DX��cG2��r��W���M�1p%  腅�鉄�f�Iɜ������
���zj�mvP���w����eùi�n�c��i��lt�歉�����0)\Lι�@��#��n�Fж��#
                                                          ����2z.OB�3����O�=����bdb`�s���a
          ��gkÈ&.ӿXzYñmv@��f��-q�g�&�Ŧy���5XZ
�=������Y2���Ti����i��x\
                        �6~�'8��k��ט�:���GT4�fþ�U���1J���&
                                                          � �����&��O��:v0L�)��A�^
      O׶��I�J�LH=��Z�g8^hʂCO�r��N���8���bmUEߵ|�$��D(��@�1��
                                                           r�:x}��糚QJ���|j3KUL����
   �`��Zʍ���PmR��0���]����5�Eį0ǫ,o�����W����^�Y ՜U
Q�;�ľ�-�]뷢[}��,��??1E�ݹ*K*��U�m��ڻk����-�3
e3*���X��_x#�1�mߎJ���m8�h�.����)��m�b���M٦�zf���G������T�pEa�   �
ɓ�l��1��V�Dt�'�9]VJ�Yo���K����Rd%�u�=�N4�Z��i,�;��m�%`����k!����w�u�~�  7
�D��l���c�>�u2p���E��~�$V^���Q�_&��2S�zV�G�kܑk�mZ��1Ӳ��]d��%�[^a!���|\�I��"Sd�ʫ`��p�vv��~�u?�J����7�h�F�~�{9=���a*H�`x��������22;)���31�N

同样的回答看起来是这样的:

<!DOCTYPE HTML>

<html>
<head>
    <title>LMS HSE</title>
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <meta name="description" content="" />
    <meta name="keywords" content="" />
    <!--[if lte IE 8]><script src="css/ie/html5shiv.js"></script><![endif]-->
    <script src="js/jquery.min.js"></script>
    <script src="js/jquery.scrolly.min.js"></script>
    <script src="js/skel.min.js"></script>
    <script src="js/init.js"></script>
    <script type="text/javascript" src="js/jquery.fileDownload.js"></script>
    <noscript>
        <link rel="stylesheet" href="css/skel.css" />
        <link rel="stylesheet" href="css/style.css?v=1" />
        <link rel="stylesheet" href="css/style-desktop.css" />
    </noscript>
    <link rel="icon" type="image/x-icon" href="favicon.ico" />
    <!--[if lte IE 8]><link rel="stylesheet" href="css/ie/v8.css" /><![endif]-->
    <!--[if lte IE 9]><link rel="stylesheet" href="css/ie/v9.css" /><![endif]-->

<style type="text/css">
/*
table {
width: 80%;
margin: auto;
}
*/
td {
text-align:left;
border-bottom: dotted 1px;
padding: 6px;
} ...

如何在nodejs中解码它?

j0pj023g

j0pj023g1#

在头文件中你接受了gzip, deflate, br,但你没有告诉request你是gzip:true,所以它没有解压缩响应。

编辑更准确

var options = {
    url: 'https://lms.hse.ru/?sl&tab=9548',
    method: 'GET',
    headers: headers,
    gzip: true 
}

这是假设您使用的是https://www.npmjs.com/package/request,这是最有可能的。

相关问题