Go exec.CommandContext在上下文超时后未终止

yr9zkbsy  于 2023-09-28  发布在  Go
关注(0)|答案(1)|浏览(138)

在golang中,我通常可以将context.WithTimeout()exec.CommandContext()结合使用,以获得一个在超时后自动终止(使用SIGKILL)的命令。
但是我遇到了一个奇怪的问题,如果我用sh -c * 和 * Package 命令,通过设置cmd.Stdout = &bytes.Buffer{}缓冲命令的输出,超时不再起作用,命令永远运行。
为什么会发生这种情况?
下面是一个最小可重复的示例:

package main

import (
    "bytes"
    "context"
    "os/exec"
    "time"
)

func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
    defer cancel()

    cmdArgs := []string{"sh", "-c", "sleep infinity"}
    bufferOutputs := true

    // Uncommenting *either* of the next two lines will make the issue go away:

    // cmdArgs = []string{"sleep", "infinity"}
    // bufferOutputs = false

    cmd := exec.CommandContext(ctx, cmdArgs[0], cmdArgs[1:]...)
    if bufferOutputs {
        cmd.Stdout = &bytes.Buffer{}
    }
    _ = cmd.Run()
}

我用Linux标记了这个问题,因为我只验证了这在Ubuntu 20.04上发生,我不确定它是否会在其他平台上重现。

6pp0gazn

6pp0gazn1#

我的问题是,当上下文超时时,子sleep进程没有被杀死。父进程sh被杀死,但子进程sleep被保留。
这通常仍然允许cmd.Wait()调用成功,但问题是cmd.Wait()等待进程退出 * 和 * 复制输出。因为我们已经分配了cmd.Stdout,所以我们必须等待sleep进程的stdout管道的读端关闭,但它永远不会关闭,因为进程仍在运行。
为了杀死子进程,我们可以通过设置Setpgid位来启动进程作为自己的进程组领导者,这将允许我们使用其 negative PID杀死进程以及任何子进程。
下面是我提出的exec.CommandContext的替代品,它可以做到这一点:

type Cmd struct {
    ctx context.Context
    terminated chan struct{}
    *exec.Cmd
}

// NewCommand is like exec.CommandContext but ensures that subprocesses
// are killed when the context times out, not just the top level process.
func NewCommand(ctx context.Context, command string, args ...string) *Cmd {
    return &Cmd{
        ctx:        ctx,
        terminated: make(chan struct{}),
        Cmd:        exec.Command(command, args...),
    }
}

func (c *Cmd) Start() error {
    // Force-enable setpgid bit so that we can kill child processes when the
    // context times out or is canceled.
    if c.Cmd.SysProcAttr == nil {
        c.Cmd.SysProcAttr = &syscall.SysProcAttr{}
    }
    c.Cmd.SysProcAttr.Setpgid = true
    err := c.Cmd.Start()
    if err != nil {
        return err
    }
    go func() {
        select {
        case <-c.terminated:
            return
        case <-c.ctx.Done():
        }
        p := c.Cmd.Process
        if p == nil {
            return
        }
        // Kill by negative PID to kill the process group, which includes
        // the top-level process we spawned as well as any subprocesses
        // it spawned.
        _ = syscall.Kill(-p.Pid, syscall.SIGKILL)
    }()
    return nil
}

func (c *Cmd) Run() error {
    if err := c.Start(); err != nil {
        return err
    }
    return c.Wait()
}

func (c *Cmd) Wait() error {
    defer close(c.terminated)
    return c.Cmd.Wait()
}
  • 更新-
    自从编写这段代码以来,我遇到过这样的情况,子进程有时想加入自己的进程组,而setpgid技巧不再起作用,因为它不会杀死那些新进程组中的进程。更健壮的解决方案可能是使用类似go-ps的东西手动遍历进程树,并为每个后代进程使用以下伪代码
// KillProcessTree kills an entire process tree using SIGKILL.
func KillProcessTree(pid int) error {
  // Send SIGSTOP to prevent new children from being spawned
  _ = syscall.Signal(pid, syscall.SIGSTOP)
  // TODO: implement ChildProcesses
  for _, c := range ChildProcesses(pid) {
    _ = KillProcessTree(c.Pid)
  }
  // Now that the process is stopped and all descendants
  // are guaranteed to be killed, we can safely SIGKILL
  // this process, without worrying about descendant
  // processes being reparented to pid 1 or anything
  // like that.
  _ = syscall.Signal(pid, syscall.SIGKILL)
  return nil // TODO: better error handling :)
}

相关问题