使用SSL密钥和证书保护Aim远程跟踪服务器

3npbholx  于 4个月前  发布在  其他
关注(0)|答案(4)|浏览(62)

使用SSL密钥和证书保护Aim远程跟踪服务器

你好,首先我要感谢你们为制作Aim所做的一切努力!

我遇到了一些关于如何保护Aim远程跟踪(RT)服务器连接的问题,不知道你们能否帮助我解决。

我最近在Azure上搭建了一个虚拟机,它同时运行着Aim RT服务器和Aim UI。为了实现这个目的,我使用了docker-compose.yml,它可以启动服务器和UI。这样就可以正常工作了,我可以从另一台机器记录运行并在UI中看到它们出现,非常棒。

然而,现在我想使用SSL安全地连接到远程跟踪服务器,就像这里描述的那样。我已经使用openssl创建了一个自签名的密钥和证书文件,就像这里描述的那样。

每当我使用以下命令启动服务器时,一切都似乎在正常工作,我没有收到任何错误等:

aim server --repo ~/mycontainer/aim/ --ssl-keyfile ~/secrets/server.key --ssl-certfile ~/secrets/server.crt --host 0.0.0.0 --dev --port 53800

但是当我尝试从另一台机器记录运行时,客户端会出现以下错误:

azureuser@ml-ci-jvranken-prd:~/cloudfiles/code/Users/jvranken/aim-tracking-server$ python aim_test.py 
Failed to connect to Aim Server. Have you forgot to run `aim server` command?
Traceback (most recent call last):
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 462, in _make_request
    httplib_response = conn.getresponse()
  File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 1375, in getresponse
    response.begin()
  File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 287, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 799, in urlopen
    retries = retries.increment(
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise
    raise value.with_traceback(tb)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 462, in _make_request
    httplib_response = conn.getresponse()
  File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 1375, in getresponse
    response.begin()
  File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 287, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/transport/utils.py", line 14, in wrapper
    return func(*args, **kwargs)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/transport/client.py", line 138, in connect
    response = requests.get(endpoint, headers=self.request_headers)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/adapters.py", line 682, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/ml-ci-jvranken-prd/code/Users/jvranken/aim-tracking-server/aim_test.py", line 7, in <module>
    run = Run(
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/exception_resistant.py", line 70, in wrapper
    _SafeModeConfig.exception_callback(e, func)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/exception_resistant.py", line 47, in reraise_exception
    raise e
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/exception_resistant.py", line 68, in wrapper
    return func(*args, **kwargs)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/run.py", line 859, in __init__
    super().__init__(run_hash, repo=repo, read_only=read_only, experiment=experiment, force_resume=force_resume)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/run.py", line 272, in __init__
    super().__init__(run_hash, repo=repo, read_only=read_only, force_resume=force_resume)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/base_run.py", line 34, in __init__
    self.repo = get_repo(repo)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/repo_utils.py", line 26, in get_repo
    repo = Repo.from_path(repo)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/repo.py", line 210, in from_path
    repo = Repo(path, read_only=read_only, init=init)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/repo.py", line 121, in __init__
    self._client = Client(remote_path)
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/transport/client.py", line 50, in __init__
    self.connect()
  File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/transport/utils.py", line 18, in wrapper
    raise RuntimeError(error_message)
RuntimeError: Failed to connect to Aim Server. Have you forgot to run `aim server` command?

你们能猜到为什么这不起作用吗?这是docker-compose.yaml和我正在使用的Python文件:

services:
  ui:
    image: aimstack/aim:3.20.1
    container_name: aim_ui
    restart: unless-stopped
    command: up --host 0.0.0.0 --port 43800 --dev
    ports:
      - 80:43800
    volumes:
    - ~/mycontainer/aim:/opt/aim
    networks:
      - aim

  server:
    image: aimstack/aim:3.20.1
    container_name: aim_server
    restart: unless-stopped
    command: server --host 0.0.0.0 --dev --ssl-keyfile /opt/secrets/server.key --ssl-certfile /opt/secrets/server.crt
    ports:
      - 53800:53800
    volumes:
    - ~/mycontainer/aim:/opt/aim
    - ~/secrets:/opt/secrets
    networks:
      - aim

networks:
  aim:
    driver: bridge
from aim import Run

# AIM_REPO='/home/azureuser/mycontainer/aim'
AIM_REPO='aim://REDACTED:53800'
AIM_EXPERIMENT='SSL-server'

run = Run(
    repo=AIM_REPO,
    experiment=AIM_EXPERIMENT
)

hparams_dict = {
    'learning_rate': 0.001,
    'batch_size': 32,
}
run['hparams'] = hparams_dict

# log metric
for i in range(30):
    if i % 5 == 0:
        i = i * 0.347
    run.track(float(i), name='numbers')
apeeds0o

apeeds0o1#

感谢JeroenVranken提出的问题。这可能与我们最近添加的授权令牌有关。@mihran113@alberttorosyan,你们有什么看法?

oaxa6hgo

oaxa6hgo2#

关于这个的更新情况如何?我想在#3206中遇到了类似的问题。

dw1jzc5e

dw1jzc5e3#

这个错误发生在3.20.1版本中,但当我回退到AIM的3.17.4版本时,一切都正常工作。

rqqzpn5f

rqqzpn5f4#

这个错误发生在3.20.1版本,但当我回退到AIM的3.17.4版本时,一切都正常工作。

你尝试过最新版本3.23.0吗?我似乎也在处理相同的问题。

相关问题