如何在Azure ML中基于触发器时间设置数据集调度？

x33g5p2x 于 2023-06-07 发布在其他

关注(0)|答案(1)|浏览(140)

我正在使用Azure机器学习（Azure ML）来管理我的机器学习工作流，并且我想根据触发时间设置数据集调度。我使用的数据集与触发时间的格式不同。例如，我的数据集的格式为“path_on_datastore/2023/01/01/some_data.tsv”，而触发时间格式则不同。
我发现调度函数支持使用“${{creation_context.trigger_time}}”作为PipelineParameter，（链接：https://learn.microsoft.com/en-us/azure/machine-learning/how-to-schedule-pipeline-job?view=azureml-api-2&tabs= cliv 2 #expressions-supported-in-schedule），但它提供的格式与我的数据集格式不匹配。我尝试使用组件来实现这一点，但组件只支持输出数据集。是否有方法自定义格式或调整触发时间格式以匹配我的数据集格式？

Azure

来源：https://stackoverflow.com/questions/76274491/how-to-set-dataset-scheduling-based-on-trigger-time-in-azure-ml

1条答案

按热度按时间

8cdiaqws1#

您可以在Azure Machine Learning中使用PythonScriptStep类来执行Python脚本，以获取基于触发器的格式化数据路径。**示例：**Python脚本文件（script.py）：

import datetime
# Trigger time is same as current_time
current_time = datetime.datetime.now()
    
# Format the current time to match the dataset path format
dataset_path = "path_on_datastore/{}/{}/{}/some_data.tsv".format(current_time.year, current_time.month, current_time.day)
    
# Use the dataset path in your further processing or operations
print(dataset_path)

使用该脚本，您可以创建一个管道：

from  azureml.core  import  Workspace, Experiment, Dataset
from  azureml.pipeline.core  import  Pipeline, PipelineData, ScheduleRecurrence
from  azureml.pipeline.steps  import  PythonScriptStep
workspace = Workspace.from_config()

script_step = PythonScriptStep(
name="Get Dataset Path",
script_name="script.py",
compute_target="targetCompute",
inputs=[],
outputs=[],
source_directory="./",
allow_reuse=False
)

然后，您可以计划管道：

# Daily execution at 8:00 AM
daily_schedule = ScheduleRecurrence(frequency="Day", interval=1, hours=[8], minutes=[0]) 

pipeline = Pipeline(workspace=workspace, steps=[script_step]) 
pipeline_schedule = pipeline.schedule( start_time="2023-06-01T08:00:00", description="Daily pipeline schedule", recurrence=daily_schedule ) 

# Pipeline Execution
experiment = Experiment(workspace, "dataset_scheduling_experiment") 
pipeline_schedule.submit(pipeline_run=experiment.submit(pipeline))

要禁用或更新计划，请执行以下操作：

# Specify the name of the pipeline schedule 
 schedule_name = 'your_schedule_name'  
 schedule = Schedule.get(workspace, schedule_name) 
 # Disable the schedule 
 schedule.disable() 
 # Update the schedule  
 schedule.update()

以上示例说明了如何使用datetime中的PythonScriptStep` class和当前时间作为触发时间。有关更多信息，请参阅this。注意：确保根据需要更改python脚本和数据存储路径。

赞(0）回复(0）举报 2023-06-07

我来回答

如何在Azure ML中基于触发器时间设置数据集调度？

1条答案

相关问题

热门标签

最新问答