ludwig Not uploading confusion_matrix (and others) figure to Comet ML

w9apscun  于 2个月前  发布在  其他
关注(0)|答案(4)|浏览(34)

描述问题

嗨,大家!
我进行了一个实验,尝试按照Ludwig文档中的第三方集成部分(https://ludwig.ai/latest/user_guide/integrations/#comet-ml)将混淆矩阵可视化发送到Comet ML,但是无法将图形发送到Comet ML。
我尝试了几次不同的配置,我可以上传"learning_curves"可视化到Comet ML,但不能上传其他可视化,如:confusion_matrix、roc_curves_from_test_statistics和precision_recall_curves_from_test_statistics。
当我运行ludwig visualize --comet --visualization learning_curves --training_statistics ./results/experiment_run/training_statistics.json时,它可以正确上传生成的图形,并出现以下输出:

COMET INFO: Couldn't find a Git repository in '/home/ec2-user/ludwig-ai-playground' nor in any parent directory. You can override where Comet is looking for a Git Patch by setting the configuration `COMET_GIT_DIRECTORY`
COMET INFO: Experiment is live on comet.com https://my_comet_ml_experiment_uri

COMET WARNING: Experiment.set_code(code=...) is deprecated, use Experiment.log_code(code=..., code_name=...) instead
COMET ERROR: '/home/ec2-user/ludwig-ai-playground/.comet.config' exists and force is not True; refusing to overwrite
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO: Comet.ml ExistingExperiment Summary
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO:   Data:
COMET INFO:     display_summary_level : 1
COMET INFO:     url                   : https://my_comet_ml_experiment_uri
COMET INFO:   Uploads:
COMET INFO:     figures     : 3
COMET INFO:     filename    : 1
COMET INFO:     html        : 1
COMET INFO:     source_code : 1
COMET INFO: 
COMET INFO: Uploading metrics, params, and assets to Comet before program termination (may take several seconds)
COMET INFO: The Python SDK has 3600 seconds to finish uploading collected data
COMET INFO: Waiting for completion of the file uploads (may take several seconds)
COMET INFO: The Python SDK has 10800 seconds to finish uploading collected data
COMET INFO: All files uploaded, waiting for confirmation they have been all received

但是当我运行ludwig visualize --comet --visualization confusion_matrix --ground_truth_metadata ./results/experiment_run/model/training_set_metadata.json --test_statistics ./results/experiment_run/test_statistics.json --top_n_classes 2时,它没有上传任何图形。以下是输出:

COMET INFO: Couldn't find a Git repository in '/home/ec2-user/ludwig-ai-playground' nor in any parent directory. You can override where Comet is looking for a Git Patch by setting the configuration `COMET_GIT_DIRECTORY`
COMET INFO: Experiment is live on comet.com https://my_comet_ml_experiment_uri

COMET WARNING: Experiment.set_code(code=...) is deprecated, use Experiment.log_code(code=..., code_name=...) instead
COMET ERROR: '/home/ec2-user/ludwig-ai-playground/.comet.config' exists and force is not True; refusing to overwrite
/home/ec2-user/ludwig-ai-playground/venv/lib64/python3.9/site-packages/ludwig/utils/visualization_utils.py:1167: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set_xticklabels([""] + labels, rotation=45, ha="left")
/home/ec2-user/ludwig-ai-playground/venv/lib64/python3.9/site-packages/ludwig/utils/visualization_utils.py:1168: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set_yticklabels([""] + labels, rotation=45, ha="right")
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO: Comet.ml ExistingExperiment Summary
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO:   Data:
COMET INFO:     display_summary_level : 1
COMET INFO:     url                   : https://my_comet_ml_experiment_uri
COMET INFO:   Uploads:
COMET INFO:     filename    : 1
COMET INFO:     html        : 1
COMET INFO:     source_code : 1
COMET INFO: 
COMET INFO: Uploading metrics, params, and assets to Comet before program termination (may take several seconds)
COMET INFO: The Python SDK has 3600 seconds to finish uploading collected data

重现问题

重现行为的方法:
要生成数据,请按照Ludwig文档中的“入门”部分( https://ludwig.ai/latest/getting_started )。此外,您还需要有一个Comet ML帐户,它是免费的( https://www.comet.com/site/ )。

  1. pip install ludwig[full] comet_ml
  2. wget https://ludwig.ai/latest/data/rotten_tomatoes.csv
  3. 创建了一个rotten_tomatoes.yaml文件,遵循训练部分( https://ludwig.ai/latest/getting_started/train/ )。
  4. 导出COMET_API_KEY="..."和COMET_PROJECT_NAME="..."。 (https://www.comet.com/docs/python-sdk/ludwig/#running-ludwig-with-comet)。
  5. ludwig experiment --comet --config rotten_tomatoes.yaml --dataset rotten_tomatoes.csv
  6. ludwig visualize --comet --visualization learning_curves --training_statistics ./results/experiment_run/training_statistics.json 。它上传了生成的图形。
  7. ludwig visualize --comet --visualization confusion_matrix --ground_truth_metadata ./results/experiment_run/model/training_set_metadata.json --test_statistics ./results/experiment_run/test_statistics.json --top_n_classes 2 。它生成了图形,但没有上传到Comet ML实验。

预期行为

我希望由ludwig生成的混淆矩阵可视化能够上传到我的Comet ML实验中。

环境

我使用AWS EC2 Amazon Linux 2023 AMI,m5.2xlarge示例和Python版本Python 3.9.16进行测试。但我认为这种特定的基础设施并不是那么相关。

izj3ouym

izj3ouym1#

你好@CostaFernando!

我之前没有尝试通过Comet ML的第三方集成上传到可视化,虽然查看了你提供的日志,看起来写请求已经发出,但由于这个错误没有被接受。

COMET ERROR: '/home/ec2-user/ludwig-ai-playground/.comet.config' exists and force is not True; refusing to overwrite

通过查看我们的comet.py实现和Comet ML文档,我看到了这个方法,这证实了默认情况下overwrite=False是启用的。
问题:

  1. 如果你反转命令的顺序,你是否看到了相同的错误?
ludwig visualize --comet --visualization confusion_matrix --ground_truth_metadata ./results/experiment_run/model/training_set_metadata.json --test_statistics ./results/experiment_run/test_statistics.json --top_n_classes 2

ludwig visualize --comet --visualization learning_curves --training_statistics ./results/experiment_run/training_statistics.json

也许第一个日志命令总是成功,而第二个不成功。

  1. 如果你指定--mlflow,你会看到相同的行为吗?
  2. 你知道ludwig-ai-playground这个名称是从哪里来的吗?我想知道这是否也需要配置,也许在日志中也建议使用环境变量。
COMET INFO: Couldn't find a Git repository in '/home/ec2-user/ludwig-ai-playground' nor in any parent directory. You can override where Comet is looking for a Git Patch by setting the configuration `COMET_GIT_DIRECTORY`
eufgjt7s

eufgjt7s2#

你好,@justinxzhao !
感谢你的回答。
我认为错误 COMET ERROR: '/home/ec2-user/ludwig-ai-playground/.comet.config' exists and force is not True; refusing to overwrite 不是根本原因,因为这个错误在两种情况下都发生了,分别是学习曲线和混淆矩阵可视化。这个错误发生是因为在 Ludwig 的 comet.py 中,第 124 行 self._save_config(config) ,你每次都尝试写一个 .comet.config,但在这种情况下已经创建过了。

  1. 如果你反转命令的顺序,是否会看到相同的错误?
    改变命令顺序没有明显的区别。我一开始只尝试了混淆矩阵,后来才使用学习曲线进行调试。

  2. 如果你指定 --mlflow 参数,是否会看到相同的行为?
    我现在不使用 MLflow,所以没有使用这个标志。我可以测试一下,它是否适用于 Comet ML?

  3. 你知道 ludwig-ai-playground 这个名字是从哪里来的吗?我想知道这是否也需要配置,也许还有日志中建议的环境变量。
    这只是我在一个 EC2 示例中创建的一个文件夹,用于以更简单的设置重现场景并在这里发布。

jhdbpxl9

jhdbpxl93#

你好,@CostaFernando,感谢你报告这个问题。我在Comet担任集成产品经理,我能够重现你的问题。

我开始查看代码,看起来通过集成记录学习曲线是Ludwig支持的,这里是学习曲线的代码:https://github.com/ludwig-ai/ludwig/blob/master/ludwig/visualize.py#L1390-L1392。但是在混淆矩阵可视化代码中,我没有看到回调函数被使用:https://github.com/ludwig-ai/ludwig/blob/master/ludwig/visualize.py#L3679。

我尝试修改代码,将回调函数传递给可视化工具函数,并在Comet中记录了图形。以下是补丁:

diff --git a/ludwig/visualize.py b/ludwig/visualize.py
index 4d5cea1e..4597b4e0 100644
--- a/ludwig/visualize.py
+++ b/ludwig/visualize.py
@@ -3685,6 +3685,7 @@ def confusion_matrix(
     model_names: Union[str, List[str]] = None,
     output_directory: str = None,
     file_format: str = "pdf",
+    callbacks: List[Callback] = None,
     **kwargs,
 ) -> None:
     """Show confusion matrix in the models predictions for each `output_feature_name`.
@@ -3758,7 +3759,7 @@ def confusion_matrix(
                         filename = filename_template_path.format(model_name_name, output_feature_name, "top" + str(k))
 
                     visualization_utils.confusion_matrix_plot(
-                        cm, labels[:k], output_feature_name=output_feature_name, filename=filename
+                        cm, labels[:k], output_feature_name=output_feature_name, filename=filename, callbacks=callbacks
                     )
 
                     entropies = []
@@ -3783,6 +3784,7 @@ def confusion_matrix(
                         labels=[labels[i] for i in class_desc_entropy],
                         title="Classes ranked by entropy of " "Confusion Matrix row",
                         filename=filename,
+                        callbacks=callbacks
                     )
     if not confusion_matrix_found:
         logger.error("Cannot find confusion_matrix in evaluation data")

也许只有学习曲线可视化与回调系统相连的原因,@justinxzhao 你知道原因吗?如果不知道,我们能否将所有可视化连接到回调系统?

5cnsuln7

5cnsuln74#

感谢@CostaFernando的回复,也感谢@Lothiraldan的迅速回应并找到了看似极有可能是根本原因的地方。这很有道理,因为MLFlow日志是通过回调函数集成的。

@w4nderlust 或 @jimthompson5802,你们知道为什么只有学习曲线与回调系统相连,而其他可视化工具没有吗?

我认为为所有可视化功能添加回调没有问题。既然你们已经找到了罪魁祸首@Lothiraldan,这是一个需要合并的更改/PR吗?我很乐意担任审查者。

相关问题