通过在本地计算机上使用azureml Python SDK,我能够配置Hyperdrive扫描作业并提交到Azure Machine Learning Studio中的计算集群。下面是我的代码示例,基于this tutorial from Microsoft Learn:
#import required libraries
from azure.ai.ml import MLClient, command
from azure.ai.ml.sweep import Choice
from azure.identity import DefaultAzureCredential
# from azure.ai.ml.entities import Environment
#connect to the workspace
ml_client = MLClient.from_config(DefaultAzureCredential())
def build_sweep_job(experiment_name, jobname):
# Create your base command job
env = '...'
compute = '...'
params = [1, 2]
n_jobs = len(models)
max_trials = n_jobs
cmd = 'python train.py --param ${{inputs.params}}'
inputs = {
'params': params[0],
}
command_job = command(code='codepath', command=cmd, environment=env, inputs=inputs, compute=compute, experiment_name=experiment_name)
# Override your inputs with parameter expressions
command_job_for_sweep = command_job(models=Choice(values=models))
sweep_job = command_job_for_sweep.sweep(
compute=compute,
sampling_algorithm='grid',
primary_metric='Best value',
goal='Minimize',
)
# Specify your experiment details
sweep_job.display_name = jobname
sweep_job.experiment_name = experiment_name
sweep_job.description = 'Run a hyperparameter sweep job.'
sweep_job.set_limits(max_concurrent_trials=n_jobs, max_total_trials=max_trials)
sweep_job.early_termination = None
return sweep_job
sweep_job = build_sweep_job(experiment_name='my_experiment', jobname='todays_job')
returned_sweep_job = ml_client.create_or_update(sweep_job)
print(returned_sweep_job.services["Studio"].endpoint)
代码正确地创建并运行我的作业。最后,ML Studio Web界面显示它已完成:
然而,在我的Python Notebook中,在VS Code中运行,即使在完成一个小时后,状态也显示为“Running”:
我该怎么解决呢?我想知道,没有离开我的本地VS代码,我的工作已经成功结束。
1条答案
按热度按时间waxmsbnn1#
根据提供的信息,
returned_job.status
似乎有延迟。获取作业的正确状态的一个可能的解决方案是,您可以使用
ml_client.jobs.list()
来获取作业的更新详细信息。下面是示例代码片段(根据您的要求修改):
通过上面的代码片段,我可以获得更新的状态。
