egg agent worker 错误:Cannot create a string longer than 0x1fffffe8 characters

gab6jxml  于 6个月前  发布在  其他
关注(0)|答案(2)|浏览(67)

Your detail info about the Bug:

线上服务接口几率性返回502,一天几次;发现agent进程内存异常高;

观察了内存升高的时间节点,找到如下错误

2023-01-17 04:56:01,271 ERROR 88 nodejs.unhandledExceptionError: Cannot create a string longer than 0x1fffffe8 characters
    at Buffer.utf8Slice (<anonymous>)
    at Buffer.toString (buffer.js:778:17)
    at JSON.parse (<anonymous>)
    at Function.decode (/home/web_server/nodejs/project/node_modules/cluster-client/lib/protocol/packet.js:86:26)
    at Socket.onReadable (/home/web_server/nodejs/project/node_modules/cluster-client/lib/server.js:138:29)
    at Socket.emit (events.js:315:20)
    at emitReadable_ (internal/streams/readable.js:569:12)
    at onEofChunk (internal/streams/readable.js:547:5)
    at readableAddChunk (internal/streams/readable.js:264:5)
    at Socket.Readable.push (internal/streams/readable.js:223:10)
code: "ERR_STRING_TOO_LONG"
name: "unhandledExceptionError"
pid: 88

这里pid为 88 ,是 agent 进程
分析为agent接收了大量数据,导致JSON.parse超出处理范围(512M)。

请求参数这里我们限制为30M

bodyParser: {
    jsonLimit: '30mb',
},

日志的数据确实有可能超过了500M,但是感觉日志文件大一些应该也没什么问题

本地debug,发现agent没有监听什么事件,只是监听了文件改动看起来是用来重启的还有心跳

我们的服务代码比较简单,只使用了比较基础的功能
不知道agent可能接收了哪些数据导致这个问题?

Reproduction Repo

几率性报问题,暂无法复现

Node Version

14

Eggjs Version

2.36.0

Plugin Name and its version

"egg-graphql": "^2.3.0",
"egg-multipart": "2.4.0",
"egg-scripts": "^2.11.0",
"egg-socket.io": "^4.1.5",
"egg-static": "^2.2.0",
"egg-validate": "^2.0.2",

Platform and its version

linux

v6ylcynt

v6ylcynt1#

在报错节点上添加了log

报错信息如下

huyu.server 1213486160 
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; de; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6
Host: 127.0.0.1:41941
Accept: */*

^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
...

分析报错日志

  • connLength为1213486160,太大为直接报错原因。
  • 这里^@搜了下,代表空字符,说明实际内容没有这么大
  • connBuf应该是一个JSON字符串,类似于这样

但是报错这里,其内容看起来是个request header之类的东西

  • 这里的41941表示clusterPort
ps -ef | grep 41941
root         101      81  2 Jan13 ?        07:16:42 /usr/sbin/node --require /home/web_server/nodejs/project/node_modules/source-map-support/register.js /home/web_server/nodejs/project/node_modules/egg-cluster/lib/app_worker.js {"framework":"/home/web_server/nodejs/project/node_modules/egg","baseDir":"/home/web_server/nodejs/project","workers":2,"plugins":null,"https":false,"title":"ks-puzzle-server","clusterPort":41941}
root         106      81  2 Jan13 ?        07:17:38 /usr/sbin/node --require /home/web_server/nodejs/project/node_modules/source-map-support/register.js /home/web_server/nodejs/project/node_modules/egg-cluster/lib/app_worker.js {"framework":"/home/web_server/nodejs/project/node_modules/egg","baseDir":"/home/web_server/nodejs/project","workers":2,"plugins":null,"https":false,"title":"ks-puzzle-server","clusterPort":41941}
root      488149      81  0 Jan24 ?        00:00:39 /usr/sbin/node --require /home/web_server/nodejs/project/node_modules/source-map-support/register.js /home/web_server/nodejs/project/node_modules/egg-cluster/lib/agent_worker.js {"framework":"/home/web_server/nodejs/project/node_modules/egg","baseDir":"/home/web_server/nodejs/project","workers":2,"plugins":null,"https":false,"title":"ks-puzzle-server","clusterPort":41941}
bgibtngc

bgibtngc2#

添加日志后发现,tcp接收到的http请求,请求url为 /app/kibana

请求方ip为:127.0.0.1,端口号每次都会发生变化;
由于使用的是docer容器,因此怀疑还是项目自身发出的请求,而不是第三方比如安全组扫描之类的原因

搜索了项目下所有代码没有kibana相关字符串,请求来源待查

相关问题