mesos马拉松应用程序的持久卷应用程序停留在暂停状态

jecbmhm3  于 2021-06-26  发布在  Mesos
关注(0)|答案(1)|浏览(315)

在使用持久本地卷的marathon中运行应用程序时遇到问题。按照说明,从角色和主体开始,创建一个具有持久卷的简单应用程序,它就挂起了。似乎从机已经给出了一个有效的报价,但实际上无法启动应用程序。slave不会记录与任务相关的任何内容,即使我使用debug选项编译并使用 GLOG_v=2 .
此外,似乎马拉松是不断滚动的任务id,因为它无法启动,但我不明白为什么在任何地方。
奇怪的是,当我在没有持久卷的情况下运行时,但在保留磁盘的情况下,应用程序开始运行。
马拉松上的调试日志似乎没有显示任何有用的东西,但是我可能遗漏了一些东西。有没有人能给我一些关于问题是什么或者在哪里寻找额外调试的建议?事先多谢。
以下是有关我的环境和调试信息的一些信息:
slave:ubuntu14.04运行0.28预构建并在0.29源代码中测试
主:mesos 0.28运行在coreos上的docker Ubuntu14.04映像中
marathon:1.1.1在coreos上的docker Ubuntu14.04映像中运行

具有持久存储的应用程序

应用程序信息来自 v2/apps/test/tasks 马拉松比赛

{
  "app": {
    "id": "/test",
    "cmd": "while true; do sleep 10; done",
    "args": null,
    "user": null,
    "env": {},
    "instances": 1,
    "cpus": 1,
    "mem": 128,
    "disk": 0,
    "executor": "",
    "constraints": [
      [
        "role",
        "CLUSTER",
        "persistent"
      ]
    ],
    "uris": [],
    "fetch": [],
    "storeUrls": [],
    "ports": [
      10002
    ],
    "portDefinitions": [
      {
        "port": 10002,
        "protocol": "tcp",
        "labels": {}
      }
    ],
    "requirePorts": false,
    "backoffSeconds": 1,
    "backoffFactor": 1.15,
    "maxLaunchDelaySeconds": 3600,
    "container": {
      "type": "MESOS",
      "volumes": [
        {
          "containerPath": "test",
          "mode": "RW",
          "persistent": {
            "size": 100
          }
        }
      ]
    },
    "healthChecks": [],
    "readinessChecks": [],
    "dependencies": [],
    "upgradeStrategy": {
      "minimumHealthCapacity": 0.5,
      "maximumOverCapacity": 0
    },
    "labels": {},
    "acceptedResourceRoles": null,
    "ipAddress": null,
    "version": "2016-05-19T11:31:54.861Z",
    "residency": {
      "relaunchEscalationTimeoutSeconds": 3600,
      "taskLostBehavior": "WAIT_FOREVER"
    },
    "versionInfo": {
      "lastScalingAt": "2016-05-19T11:31:54.861Z",
      "lastConfigChangeAt": "2016-05-18T16:46:59.684Z"
    },
    "tasksStaged": 0,
    "tasksRunning": 0,
    "tasksHealthy": 0,
    "tasksUnhealthy": 0,
    "deployments": [
      {
        "id": "4f3779e5-a805-4b95-9065-f3cf9c90c8fe"
      }
    ],
    "tasks": [
      {
        "id": "test.4b7d4303-1dc2-11e6-a179-a2bd870b1e9c",
        "slaveId": "9f7c6ed5-4bf5-475d-9311-05d21628604e-S17",
        "host": "ip-10-0-90-61.eu-west-1.compute.internal",
        "localVolumes": [
          {
            "containerPath": "test",
            "persistenceId": "test#test#4b7d4302-1dc2-11e6-a179-a2bd870b1e9c"
          }
        ],
        "appId": "/test"
      }
    ]
  }
}

马拉松中的应用程序信息:(似乎部署正在旋转)

没有持久存储的应用程序

应用程序信息来自 v2/apps/test2/tasks 马拉松比赛

{
  "app": {
    "id": "/test2",
    "cmd": "while true; do sleep 10; done",
    "args": null,
    "user": null,
    "env": {},
    "instances": 1,
    "cpus": 1,
    "mem": 128,
    "disk": 100,
    "executor": "",
    "constraints": [
      [
        "role",
        "CLUSTER",
        "persistent"
      ]
    ],
    "uris": [],
    "fetch": [],
    "storeUrls": [],
    "ports": [
      10002
    ],
    "portDefinitions": [
      {
        "port": 10002,
        "protocol": "tcp",
        "labels": {}
      }
    ],
    "requirePorts": false,
    "backoffSeconds": 1,
    "backoffFactor": 1.15,
    "maxLaunchDelaySeconds": 3600,
    "container": null,
    "healthChecks": [],
    "readinessChecks": [],
    "dependencies": [],
    "upgradeStrategy": {
      "minimumHealthCapacity": 0.5,
      "maximumOverCapacity": 0
    },
    "labels": {},
    "acceptedResourceRoles": null,
    "ipAddress": null,
    "version": "2016-05-19T13:44:01.831Z",
    "residency": null,
    "versionInfo": {
      "lastScalingAt": "2016-05-19T13:44:01.831Z",
      "lastConfigChangeAt": "2016-05-19T13:09:20.106Z"
    },
    "tasksStaged": 0,
    "tasksRunning": 1,
    "tasksHealthy": 0,
    "tasksUnhealthy": 0,
    "deployments": [],
    "tasks": [
      {
        "id": "test2.bee624f1-1dc7-11e6-b98e-568f3f9dead8",
        "slaveId": "9f7c6ed5-4bf5-475d-9311-05d21628604e-S18",
        "host": "ip-10-0-90-61.eu-west-1.compute.internal",
        "startedAt": "2016-05-19T13:44:02.190Z",
        "stagedAt": "2016-05-19T13:44:02.023Z",
        "ports": [
          31926
        ],
        "version": "2016-05-19T13:44:01.831Z",
        "ipAddresses": [
          {
            "ipAddress": "10.0.90.61",
            "protocol": "IPv4"
          }
        ],
        "appId": "/test2"
      }
    ],
    "lastTaskFailure": {
      "appId": "/test2",
      "host": "ip-10-0-90-61.eu-west-1.compute.internal",
      "message": "Slave ip-10-0-90-61.eu-west-1.compute.internal removed: health check timed out",
      "state": "TASK_LOST",
      "taskId": "test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c",
      "timestamp": "2016-05-19T13:15:24.155Z",
      "version": "2016-05-19T13:09:20.106Z",
      "slaveId": "9f7c6ed5-4bf5-475d-9311-05d21628604e-S17"
    }
  }
}

运行应用程序时的从属日志没有:

I0519 13:09:22.471876 12459 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000
I0519 13:09:22.471906 12459 status_update_manager.cpp:497] Creating StatusUpdate stream for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000
I0519 13:09:22.472262 12459 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000
I0519 13:09:22.477686 12459 status_update_manager.cpp:374] Forwarding update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000 to the agent
I0519 13:09:22.477830 12453 process.cpp:2605] Resuming slave(1)@10.0.90.61:5051 at 2016-05-19 13:09:22.477814016+00:00
I0519 13:09:22.477967 12453 slave.cpp:3638] Forwarding the update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000 to master@10.0.82.230:5050
I0519 13:09:22.478185 12453 slave.cpp:3532] Status update manager successfully handled status update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000
I0519 13:09:22.478229 12453 slave.cpp:3548] Sending acknowledgement for status update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000 to executor(1)@10.0.90.61:34262
I0519 13:09:22.488315 12460 pid.cpp:95] Attempting to parse 'master@10.0.82.230:5050' into a PID
I0519 13:09:22.488370 12460 process.cpp:646] Parsed message name 'mesos.internal.StatusUpdateAcknowledgementMessage' for slave(1)@10.0.90.61:5051 from master@10.0.82.230:5050
I0519 13:09:22.488452 12452 process.cpp:2605] Resuming slave(1)@10.0.90.61:5051 at 2016-05-19 13:09:22.488441856+00:00
I0519 13:09:22.488600 12458 process.cpp:2605] Resuming (14)@10.0.90.61:5051 at 2016-05-19 13:09:22.488590080+00:00
I0519 13:09:22.488632 12458 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000
I0519 13:09:22.488726 12458 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000
I0519 13:09:22.492985 12452 process.cpp:2605] Resuming slave(1)@10.0.90.61:5051 at 2016-05-19 13:09:22.492974080+00:00
I0519 13:09:22.493021 12452 slave.cpp:2629] Status update manager successfully handled status update acknowledgement (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000
tuwxkamq

tuwxkamq1#

可能是由于磁盘空间或ram不足。最小空闲配置在下面的链接中指定

相关问题