linux 子进程(在后台启动ssh进程)communicate hang if enable stderr

disbfnqx  于 2023-10-16  发布在  Linux
关注(0)|答案(3)|浏览(140)

我有下一个代码,它会做下一个:

  1. ssh -f -M让ssh在后台启动共享套接字的子进程
    1.由于上面是在后台,所以对于第二个ssh连接,我们可以重用socket /tmp/control-channel来连接ssh服务器,而不需要密码。

test.py:

import subprocess
import os
import sys
import stat

ssh_user = "my_user"       # change to your account
ssh_passwd = "my_password" # change to your password

try:
    os.remove("/tmp/control-channel")
except:
    pass

# prepare passwd file
file = open("./passwd","w")
passwd_content = f"#!/bin/sh\necho {ssh_passwd}"
file.write(passwd_content)
file.close()
os.chmod("./passwd", stat.S_IRWXU)

# setup shared ssh socket, put it in background
env = {'SSH_ASKPASS': "./passwd", 'DISPLAY':'', 'SSH_ASKPASS_REQUIRE':'force'}
args = ['ssh', '-f', '-o', 'LogLevel=ERROR', '-x', '-o', 'ConnectTimeout=30', '-o', 'ControlPersist=300', '-o', 'UserKnownHostsFile=/dev/null', '-o', 'StrictHostKeyChecking=no', '-o', 'ServerAliveInterval=15', '-MN', '-S', '/tmp/control-channel', '-p', '22', '-l', ssh_user, 'localhost']
process = subprocess.Popen(args, env=env,
        stdout=subprocess.PIPE,
#        stderr=subprocess.STDOUT,   # uncomment this line to enable stderr will make subprocess hang
        stdin=subprocess.DEVNULL,
        start_new_session=True)
sout, serr = process.communicate()
print(sout)
print(serr)

# use shared socket
args2 = ['ssh', '-o', 'LogLevel=ERROR', '-o', 'ControlPath=/tmp/control-channel', '-p', '22', '-l', ssh_user, 'localhost', 'uname -a']
process2 = subprocess.Popen(args2,
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        stdin=subprocess.DEVNULL)
content, _ = process2.communicate()
print(content)

执行:

$ python3 test.py
b''
None
b'Linux shmachine 4.19.0-21-amd64 #1 SMP Debian 4.19.249-2 (2022-06-30) x86_64 GNU/Linux\n'

到目前为止一切顺利,如果我在第一个子进程中取消注解stderr=subprocess.STDOUT,它将挂起:

$ python3 test.py
^CTraceback (most recent call last):
  File "test.py", line 29, in <module>
    sout, serr = process.communicate()
  File "/usr/lib/python3.7/subprocess.py", line 926, in communicate
    stdout = self.stdout.read()
KeyboardInterrupt

我想知道这里有什么问题?
我的环境:

$ python3 --version
Python 3.7.3
$ ssh -V
OpenSSH_7.9p1 Debian-10+deb10u2, OpenSSL 1.1.1n  15 Mar 2022
$ cat /etc/issue
Debian GNU/Linux 10 \n \l

更新:我看到this post类似于我的问题,但没有答案。
问题2:将communicate更改为wait使其工作,但pipe size which wait use肯定小于memory size which communicate use,所以我仍然想知道为什么我不能使其与communicate一起工作。

qmelpv7a

qmelpv7a1#

答案其实就隐藏在python文档子进程.Popen中,在Popen.communicate的文档中,读到以下内容:
Note: The data read is buffered in memory, so do not use this method if the data size is large or unlimited.
因此,一个(潜在的)可行的解决方案(因为我实际上没有一个可以测试的SSH)是向communicate调用添加一个timeout

62lalag4

62lalag42#

我删除了stdout管道集,只留下stderr进行最小的检查,最后使用strace来确认这是在ssh 8.4中修复的Y2020旧ssh的bug。所以巨蟒确实纠正了行为。。
1.我看到它被卡住了,就像下一个:

$ strace python3 test.py
lseek(3, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
read(3, 0x1744f00, 8192)                = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=26779, si_uid=1001, si_status=0, si_utime=1, si_stime=0} ---
read(3,
$ sudo lsof -a -c python -d 3
COMMAND   PID     USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
python3 26778 nxa13855    3r  FIFO   0,12      0t0 9186593 pipe
$ ls -l "/proc/26778/fd/3"
lr-x------ 1 nxa13855 atg 64 Oct  5 20:22 /proc/26778/fd/3 -> 'pipe:[9186593]'
$ lsof | grep 9186593
python3   26778                   nxa13855    3r     FIFO               0,12      0t0    9186593 pipe
ssh       26783                   nxa13855    2w     FIFO               0,12      0t0    9186593 pipe

1.这意味着ssh -f在将ssh置于后台时没有关闭stderr,因此communicate无法读取stderr的一个字节,然后挂起为next:

File "/usr/lib/python3.7/subprocess.py", line 929, in communicate
stderr = self.stderr.read()

我用openssh commit确认

jxct1oxe

jxct1oxe3#

问题是,即使在调用communicate()方法之后,ssh -f命令仍将继续在后台运行。这是因为communicate()方法只等待进程完成对标准输出和标准错误流的写入。它不会等待进程实际终止。
当您将ssh -f命令的标准错误流重定向到其标准输出流时,communicate()方法将永远不会返回。这是因为ssh -f命令只要运行就会继续写入其标准输出流。
要解决此问题,您需要:
1.调用process.terminate()方法显式终止ssh -f命令。
1.使用communicate()方法的timeout关键字参数来指定超时时间,超过该时间,即使进程仍在运行,communicate()方法也会返回。
下面是如何使用process.terminate()方法解决问题的示例:

import subprocess
import os
import stat

ssh_user = "my_user"       # change to your account
ssh_passwd = "my_password" # change to your password

try:
    os.remove("/tmp/control-channel")
except:
    pass

# prepare passwd file
file = open("./passwd","w")
passwd_content = f"#!/bin/sh\necho {ssh_passwd}"
file.write(passwd_content)
file.close()
os.chmod("./passwd", stat.S_IRWXU)

# setup shared ssh socket, put it in background
env = {'SSH_ASKPASS': "./passwd", 'DISPLAY':'', 'SSH_ASKPASS_REQUIRE':'force'}
args = ['ssh', '-f', '-o', 'LogLevel=ERROR', '-x', '-o', 'ConnectTimeout=30', '-o', 'ControlPersist=300', '-o', 'UserKnownHostsFile=/dev/null', '-o', 'StrictHostKeyChecking=no', '-o', 'ServerAliveInterval=15', '-MN', '-S', '/tmp/control-channel', '-p', '22', '-l', ssh_user, 'localhost']
process = subprocess.Popen(args, env=env,
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,   # enable stderr
        stdin=subprocess.DEVNULL,
        start_new_session=True)

# wait for the ssh command to start up
process.wait(timeout=10)

# use shared socket
args2 = ['ssh', '-o', 'LogLevel=ERROR', '-o', 'ControlPath=/tmp/control-channel', '-p', '22', '-l', ssh_user, 'localhost', 'uname -a']
process2 = subprocess.Popen(args2,
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        stdin=subprocess.DEVNULL)
content, _ = process2.communicate()

# terminate the ssh command
process.terminate()

print(content)

在使用共享套接字之前,此代码将等待ssh -f命令启动长达10秒。如果ssh -f命令没有在10秒内启动,代码将引发TimeoutError异常。
一旦共享套接字可用,代码将使用它连接到SSH服务器并运行uname -a命令。uname -a命令的输出将被打印到控制台。
最后,代码将使用process.terminate()方法终止ssh -f命令。

相关问题