使用SSL密钥和证书保护Aim远程跟踪服务器
你好,首先我要感谢你们为制作Aim所做的一切努力!
我遇到了一些关于如何保护Aim远程跟踪(RT)服务器连接的问题,不知道你们能否帮助我解决。
我最近在Azure上搭建了一个虚拟机,它同时运行着Aim RT服务器和Aim UI。为了实现这个目的,我使用了docker-compose.yml
,它可以启动服务器和UI。这样就可以正常工作了,我可以从另一台机器记录运行并在UI中看到它们出现,非常棒。
然而,现在我想使用SSL安全地连接到远程跟踪服务器,就像这里描述的那样。我已经使用openssl创建了一个自签名的密钥和证书文件,就像这里描述的那样。
每当我使用以下命令启动服务器时,一切都似乎在正常工作,我没有收到任何错误等:
aim server --repo ~/mycontainer/aim/ --ssl-keyfile ~/secrets/server.key --ssl-certfile ~/secrets/server.crt --host 0.0.0.0 --dev --port 53800
但是当我尝试从另一台机器记录运行时,客户端会出现以下错误:
azureuser@ml-ci-jvranken-prd:~/cloudfiles/code/Users/jvranken/aim-tracking-server$ python aim_test.py
Failed to connect to Aim Server. Have you forgot to run `aim server` command?
Traceback (most recent call last):
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 715, in urlopen
httplib_response = self._make_request(
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 467, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 462, in _make_request
httplib_response = conn.getresponse()
File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 1375, in getresponse
response.begin()
File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 318, in begin
version, status, reason = self._read_status()
File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 287, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 799, in urlopen
retries = retries.increment(
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 715, in urlopen
httplib_response = self._make_request(
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 467, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/urllib3/connectionpool.py", line 462, in _make_request
httplib_response = conn.getresponse()
File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 1375, in getresponse
response.begin()
File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 318, in begin
version, status, reason = self._read_status()
File "/anaconda/envs/verhuiskans/lib/python3.10/http/client.py", line 287, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/transport/utils.py", line 14, in wrapper
return func(*args, **kwargs)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/transport/client.py", line 138, in connect
response = requests.get(endpoint, headers=self.request_headers)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/api.py", line 73, in get
return request("get", url, params=params, **kwargs)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/requests/adapters.py", line 682, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/ml-ci-jvranken-prd/code/Users/jvranken/aim-tracking-server/aim_test.py", line 7, in <module>
run = Run(
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/exception_resistant.py", line 70, in wrapper
_SafeModeConfig.exception_callback(e, func)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/exception_resistant.py", line 47, in reraise_exception
raise e
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/exception_resistant.py", line 68, in wrapper
return func(*args, **kwargs)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/run.py", line 859, in __init__
super().__init__(run_hash, repo=repo, read_only=read_only, experiment=experiment, force_resume=force_resume)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/run.py", line 272, in __init__
super().__init__(run_hash, repo=repo, read_only=read_only, force_resume=force_resume)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/base_run.py", line 34, in __init__
self.repo = get_repo(repo)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/repo_utils.py", line 26, in get_repo
repo = Repo.from_path(repo)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/repo.py", line 210, in from_path
repo = Repo(path, read_only=read_only, init=init)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/sdk/repo.py", line 121, in __init__
self._client = Client(remote_path)
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/transport/client.py", line 50, in __init__
self.connect()
File "/anaconda/envs/verhuiskans/lib/python3.10/site-packages/aim/ext/transport/utils.py", line 18, in wrapper
raise RuntimeError(error_message)
RuntimeError: Failed to connect to Aim Server. Have you forgot to run `aim server` command?
你们能猜到为什么这不起作用吗?这是docker-compose.yaml
和我正在使用的Python文件:
services:
ui:
image: aimstack/aim:3.20.1
container_name: aim_ui
restart: unless-stopped
command: up --host 0.0.0.0 --port 43800 --dev
ports:
- 80:43800
volumes:
- ~/mycontainer/aim:/opt/aim
networks:
- aim
server:
image: aimstack/aim:3.20.1
container_name: aim_server
restart: unless-stopped
command: server --host 0.0.0.0 --dev --ssl-keyfile /opt/secrets/server.key --ssl-certfile /opt/secrets/server.crt
ports:
- 53800:53800
volumes:
- ~/mycontainer/aim:/opt/aim
- ~/secrets:/opt/secrets
networks:
- aim
networks:
aim:
driver: bridge
from aim import Run
# AIM_REPO='/home/azureuser/mycontainer/aim'
AIM_REPO='aim://REDACTED:53800'
AIM_EXPERIMENT='SSL-server'
run = Run(
repo=AIM_REPO,
experiment=AIM_EXPERIMENT
)
hparams_dict = {
'learning_rate': 0.001,
'batch_size': 32,
}
run['hparams'] = hparams_dict
# log metric
for i in range(30):
if i % 5 == 0:
i = i * 0.347
run.track(float(i), name='numbers')
4条答案
按热度按时间apeeds0o1#
感谢JeroenVranken提出的问题。这可能与我们最近添加的授权令牌有关。@mihran113@alberttorosyan,你们有什么看法?
oaxa6hgo2#
关于这个的更新情况如何?我想在#3206中遇到了类似的问题。
dw1jzc5e3#
这个错误发生在3.20.1版本中,但当我回退到AIM的3.17.4版本时,一切都正常工作。
rqqzpn5f4#
这个错误发生在3.20.1版本,但当我回退到AIM的3.17.4版本时,一切都正常工作。
你尝试过最新版本
3.23.0
吗?我似乎也在处理相同的问题。