arthas 在k8s 临时容器 ephemeral container里 attach 问题

cbeh67ev  于 2021-11-28  发布在  Java
关注(0)|答案(5)|浏览(637)
  • https://kubernetes.io/docs/tasks/debug-application-cluster/debug-running-pod/#ephemeral-container

测试版本信息

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T21:04:39Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:53:14Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}

start k8s

minikube start --feature-gates=EphemeralContainers=true

start a simple java pod:

kubectl run arthas-demo --image=hengyunabc/atest:0.0.3

Dockerfile:

FROM openjdk:8-jdk
RUN wget https://arthas.aliyun.com/math-game.jar

ENTRYPOINT ["/bin/sh", "-c", "java -jar math-game.jar"]

Check pods status:

$ kubectl get pods
NAME          READY   STATUS    RESTARTS   AGE
arthas-demo   1/1     Running   0          5m58s

Debug ephemeral containers

kubectl debug -it arthas-demo --image=openjdk:8-jdk --target=arthas-demo

ps 可以看到进程,但是jps看不到,并且jstack -l失败

root@arthas-demo:/# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0   2392   744 ?        Ss   10:00   0:00 /bin/sh -c java -jar math-game.jar
root           8  0.1  0.7 4054728 62724 ?       Sl   10:00   0:10 java -jar math-game.jar
root          18  0.0  0.0   5756  3600 pts/0    Ss+  10:00   0:00 bash
root         115  1.0  0.0   5756  3552 pts/0    Ss   11:50   0:00 bash
root         121  0.0  0.0   9396  3012 pts/0    R+   11:50   0:00 ps aux
root@arthas-demo:/# jps
122 Jps
root@arthas-demo:/# jstack -l 8
8: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding

尝试

尝试把应用容器里的 /tmp/hsperfdata_root/目录复制到临时容器里,仍然失败:

root@arthas-demo:/# cp -r /proc/8/root/tmp/hsperfdata_root/ /tmp
root@arthas-demo:/# jps
148 Jps
8 jar
root@arthas-demo:/# jstack -l 8
8: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding
z0qdvdin

z0qdvdin1#

测试 jdk17

启动应用容器

kubectl run arthas-demo --image=hengyunabc/atest:0.0.4

doockerfile

FROM openjdk:17-jdk
RUN curl https://arthas.aliyun.com/math-game.jar -o math-game.jar
ENTRYPOINT ["/bin/sh" "-c" "java -jar math-game.jar"]

用临时容器debug

临时容器的dockerfile

FROM openjdk:17-jdk
CMD ["sh"]

执行命令

kubectl debug -it arthas-demo --image=hengyunabc/atest:0.0.4-debug --target=arthas-demo

然后在容器里直接执行jps是没有结果的,复制进程1的 /proc/1/root/tmp/hsperfdata_root/ 到临时容器的 /tmp目录,才可以jps/jstack 成功。

sh-4.4# /usr/java/openjdk-17/bin/jps
82 Jps
sh-4.4# cp -r /proc/1/root/tmp/hsperfdata_root/ /tmp
sh-4.4# /usr/java/openjdk-17/bin/jps
97 Jps
1 math-game.jar
sh-4.4# /usr/java/openjdk-17/bin/jstack -l 1
2021-07-28 16:42:04
Full thread dump OpenJDK 64-Bit Server VM (17-ea+32-2679 mixed mode, sharing):

再在临时容器里测试 arthas:

curl -O https://arthas.aliyun.com/arthas-boot.jar
java -jar arthas-boot.jar

结果attach时出错Agent JAR not found or no Agent-Class attribute

sh-4.4# java -jar arthas-boot.jar
[INFO] arthas-boot version: 3.5.3
[INFO] Found existing java process, please choose one and input the serial number of the process, eg : 1. Then hit ENTER.

* [1]: 1 math-game.jar

[INFO] Start download arthas from remote server: https://arthas.aliyun.com/download/3.5.3?mirror=center
[INFO] File size: 12.72 MB, downloaded size: 1.41 MB, downloading ...
[INFO] File size: 12.72 MB, downloaded size: 4.69 MB, downloading ...
[INFO] File size: 12.72 MB, downloaded size: 9.04 MB, downloading ...
[INFO] Download arthas success.
[INFO] arthas home: /root/.arthas/lib/3.5.3/arthas
[INFO] Try to attach process 1
[ERROR] Start arthas failed, exception stack trace:
com.sun.tools.attach.AgentLoadException: Agent JAR not found or no Agent-Class attribute
	at jdk.attach/sun.tools.attach.HotSpotVirtualMachine.loadAgent(HotSpotVirtualMachine.java:160)
	at com.taobao.arthas.core.Arthas.attachAgent(Arthas.java:120)
	at com.taobao.arthas.core.Arthas.<init>(Arthas.java:26)
	at com.taobao.arthas.core.Arthas.main(Arthas.java:139)
[INFO] Attach process 1 success.
[INFO] arthas-client connect 127.0.0.1 3658
Connect to telnet server error: 127.0.0.1 3658
java.net.ConnectException: Connection refused
	at java.base/sun.nio.ch.Net.pollConnect(Native Method)
	at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
	at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:542)
	at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
	at java.base/java.net.Socket.connect(Socket.java:633)
	at org.apache.commons.net.SocketClient.connect(SocketClient.java:188)
	at org.apache.commons.net.SocketClient.connect(SocketClient.java:209)
	at com.taobao.arthas.client.TelnetConsole.process(TelnetConsole.java:306)
	at com.taobao.arthas.client.TelnetConsole.main(TelnetConsole.java:166)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at com.taobao.arthas.boot.Bootstrap.main(Bootstrap.java:615)
Usage: arthas-client [--help] [-c <value>] [-f <value>] [-w <value>] [-t
       <value>] [-h <value>] [target-ip] [port]

原因是arthas-boot把arthas文件下载到临时容器的 ~/.arthas 目录里,但是这个在应用容器里是看不到的。

要解决这个问题,要把arthas的文件放到应用容器能读取到的目录下。
比如放到 /proc/1/root 下。

sh-4.4# cp -r /root/.arthas/ /proc/1/root/root/

然后就可以attach成功了。

sh-4.4# java -jar arthas-boot.jar
[INFO] arthas-boot version: 3.5.3
[INFO] Found existing java process, please choose one and input the serial number of the process, eg : 1. Then hit ENTER.

* [1]: 1 math-game.jar

[INFO] arthas home: /root/.arthas/lib/3.5.3/arthas
[INFO] Try to attach process 1
[INFO] Attach process 1 success.
[INFO] arthas-client connect 127.0.0.1 3658

所以总结下来,本质上要解决两个问题:

  • jvm自身的attach机制问题
  • 应用容器要能读取到 arthas下载的文件问题

最简便的办法是 临时容器能和应用容器共享 /tmp目录。但kubectl命令好像无法指定,可能要用api的方式。

z8dt9xmd

z8dt9xmd2#

您好。按照您的思路,我在k8s中进行了尝试。希望通过边车模式,主容器与副容器共享/tmp目录,然后在主容器中运行Java程序,副容器运行arthas对主容器中应用进行监控。
yaml文件如下,是拉取了您前文中的镜像:

apiVersion: v1
kind: Pod
metadata:
  name: arthas4
spec:
  shareProcessNamespace: true
  containers:
  - name: glassfish2
    image: hengyunabc/atest:0.0.4
    volumeMounts:
    - name: html
      mountPath: /tmp/
  - name: glassfish
    image: hengyunabc/atest:0.0.4
    volumeMounts:
    - name: html
      mountPath: /tmp/
  volumes:
  - name: html
    emptyDir: {}

pod成功启动后,进入到副容器的/tmp目录下,下载arthas-boot.jar并运行,然后cp -r /root/.arthas/ /tmp/,得到如下界面,能够看到主容器中的java进程(进程号为6)

sh-4.4# java -jar arthas-boot.jar
[INFO] arthas-boot version: 3.5.3
[INFO] Found existing java process, please choose one and input the serial number of the process, eg : 1. Then hit ENTER.

* [1]: 6 math-game.jar

  [2]: 24 math-game.jar

但当输入数字1之后,仍然出现了跟您上面一样的报错信息

sh-4.4# java -jar arthas-boot.jar
[INFO] arthas-boot version: 3.5.3
[INFO] Found existing java process, please choose one and input the serial number of the process, eg : 1. Then hit ENTER.

* [1]: 6 math-game.jar

  [2]: 24 math-game.jar
1
[INFO] arthas home: /root/.arthas/lib/3.5.3/arthas
[INFO] Try to attach process 6
[ERROR] Start arthas failed, exception stack trace: 
com.sun.tools.attach.AgentLoadException: Agent JAR not found or no Agent-Class attribute
        at jdk.attach/sun.tools.attach.HotSpotVirtualMachine.loadAgent(HotSpotVirtualMachine.java:160)
        at com.taobao.arthas.core.Arthas.attachAgent(Arthas.java:120)
        at com.taobao.arthas.core.Arthas.<init>(Arthas.java:26)
        at com.taobao.arthas.core.Arthas.main(Arthas.java:139)
[INFO] Attach process 6 success.
[INFO] arthas-client connect 127.0.0.1 3658
Connect to telnet server error: 127.0.0.1 3658
java.net.ConnectException: Connection refused
        at java.base/sun.nio.ch.Net.pollConnect(Native Method)
        at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
        at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:542)
        at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597)
        at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
        at java.base/java.net.Socket.connect(Socket.java:633)
        at org.apache.commons.net.SocketClient.connect(SocketClient.java:188)
        at org.apache.commons.net.SocketClient.connect(SocketClient.java:209)
        at com.taobao.arthas.client.TelnetConsole.process(TelnetConsole.java:306)
        at com.taobao.arthas.client.TelnetConsole.main(TelnetConsole.java:166)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at com.taobao.arthas.boot.Bootstrap.main(Bootstrap.java:615)
Usage: arthas-client [--help] [-c <value>] [-f <value>] [-w <value>] [-t
       <value>] [-h <value>] [target-ip] [port]

Arthas Telnet Client

EXAMPLES:
  java -jar arthas-client.jar 127.0.0.1 3658
  java -jar arthas-client.jar -c 'dashboard -n 1'
  java -jar arthas-client.jar -f batch.as 127.0.0.1

但是我已将两个容器共享了/tmp目录,并将安装arthas的副容器的/root/.arthas/文件移动到了/tmp目录下。
请您赐教还可能是哪些方面的问题呢?谢谢

sy5wg1nm

sy5wg1nm3#

@yjustdo 和 arthas查找 jar 目录有关。 要cd到 arthas目录下面执行启动。 出错的信息来看,还是应用进程加载不到 副容器里的文件的问题。 试下启动前增加 export ARTHAS_LIB_DIR=/tmp ,把arthas 下载lib目录配置到 /tmp下面。

jk9hmnmh

jk9hmnmh4#

谢谢。cd arthas后再执行java -jar arthas-boot.jar,可以attach到进程上了

sh-4.4# cd arthas
sh-4.4# java -jar arthas-boot.jar
[INFO] arthas-boot version: 3.5.3
[INFO] Found existing java process, please choose one and input the serial number of the process, eg : 1. Then hit ENTER.

* [1]: 6 math-game.jar

  [2]: 24 math-game.jar
1
[INFO] arthas home: /tmp/arthas
[INFO] Try to attach process 6
[INFO] Attach process 6 success.
[INFO] arthas-client connect 127.0.0.1 3658
  ,---.  ,------. ,--------.,--.  ,--.  ,---.   ,---.                           
 /  O  \ |  .--. ''--.  .--'|  '--'  | /  O  \ '   .-'                          
|  .-.  ||  '--'.'   |  |   |  .--.  ||  .-.  |`.  `-.                          
|  | |  ||  |\  \    |  |   |  |  |  ||  | |  |.-'    |                         
`--' `--'`--' '--'   `--'   `--'  `--'`--' `--'`-----'                          

wiki       https://arthas.aliyun.com/doc                                        
tutorials  https://arthas.aliyun.com/doc/arthas-tutorials.html                  
version    3.5.3                                                                
main_class                                                                      
pid        6                                                                    
time       2021-08-11 01:16:15                                                  

[arthas@6]$
bvuwiixz

bvuwiixz5#

用最新版本的 jattach 可以加载 agent,不过这个每个平台的二进制文件都不同,会增大复杂度和文件体积。
https://github.com/apangin/jattach/releases/tag/v2.0

另外,使用 jattach 仍然要把文件复制到 /proc/$pid/root/ 目录下。

相关问题