获取子文件夹中的所有文件夹名称Azure Data Factory

nkkqxpd9  于 2023-06-24  发布在  其他
关注(0)|答案(1)|浏览(150)

我有一个下面的文件夹结构在数据湖;
datasetname/fullload/year/month/day/hour/min/sec/data

我无法创建Azure函数或数据块。仅仅是简单的ADF活性
我想从ParentFolder目录(datasetname/fullload)的所有子文件夹中获取最新的文件夹名称。我尝试了GetMetadata -> set variable then loop,但仍然不起作用

我需要获取blob存储中文件夹的最新路径
谢谢

clj7thdc

clj7thdc1#

  • 由于您需要最新的数据,而文件夹大多是数字,因此您可以在每个子文件夹中找到最大的数字来查找最新的数据。
  • 我的文件数据如下图所示:

  • 为了找到最大的文件夹,我使用了2个管道。pipeline1用于迭代并获取子项,直到子项不存在。pipeline2用于查找特定文件夹中子文件夹名称列表的最大数量。
  • 以下是pipeline1的pipeline JSON:
{
    "name": "pipeline1",
    "properties": {
        "activities": [
            {
                "name": "get path",
                "type": "Until",
                "dependsOn": [
                    {
                        "activity": "Set flag",
                        "dependencyConditions": [
                            "Succeeded"
                        ]
                    }
                ],
                "userProperties": [],
                "typeProperties": {
                    "expression": {
                        "value": "@equals(variables('flag'),'true')",
                        "type": "Expression"
                    },
                    "activities": [
                        {
                            "name": "Get sub folders",
                            "type": "GetMetadata",
                            "dependsOn": [],
                            "policy": {
                                "timeout": "0.12:00:00",
                                "retry": 0,
                                "retryIntervalInSeconds": 30,
                                "secureOutput": false,
                                "secureInput": false
                            },
                            "userProperties": [],
                            "typeProperties": {
                                "dataset": {
                                    "referenceName": "root",
                                    "type": "DatasetReference",
                                    "parameters": {
                                        "path": {
                                            "value": "@variables('path')",
                                            "type": "Expression"
                                        }
                                    }
                                },
                                "fieldList": [
                                    "childItems"
                                ],
                                "storeSettings": {
                                    "type": "AzureBlobFSReadSettings",
                                    "enablePartitionDiscovery": false
                                },
                                "formatSettings": {
                                    "type": "DelimitedTextReadSettings"
                                }
                            }
                        },
                        {
                            "name": "If Condition1",
                            "type": "IfCondition",
                            "dependsOn": [
                                {
                                    "activity": "Get sub folders",
                                    "dependencyConditions": [
                                        "Succeeded"
                                    ]
                                }
                            ],
                            "userProperties": [],
                            "typeProperties": {
                                "expression": {
                                    "value": "@greater(length(activity('Get sub folders').output.childItems),0)",
                                    "type": "Expression"
                                },
                                "ifFalseActivities": [
                                    {
                                        "name": "Set variable1",
                                        "type": "SetVariable",
                                        "dependsOn": [],
                                        "policy": {
                                            "timeout": "0.12:00:00",
                                            "retry": 0,
                                            "retryIntervalInSeconds": 30,
                                            "secureOutput": false,
                                            "secureInput": false
                                        },
                                        "userProperties": [],
                                        "typeProperties": {
                                            "variableName": "flag",
                                            "value": {
                                                "value": "true",
                                                "type": "Expression"
                                            }
                                        }
                                    }
                                ],
                                "ifTrueActivities": [
                                    {
                                        "name": "get latest",
                                        "type": "ExecutePipeline",
                                        "dependsOn": [],
                                        "userProperties": [],
                                        "typeProperties": {
                                            "pipeline": {
                                                "referenceName": "pipeline2",
                                                "type": "PipelineReference"
                                            },
                                            "waitOnCompletion": true,
                                            "parameters": {
                                                "array_to_find_max": {
                                                    "value": "@activity('Get sub folders').output.childItems",
                                                    "type": "Expression"
                                                }
                                            }
                                        }
                                    },
                                    {
                                        "name": "append max to path",
                                        "type": "SetVariable",
                                        "dependsOn": [
                                            {
                                                "activity": "get latest",
                                                "dependencyConditions": [
                                                    "Succeeded"
                                                ]
                                            }
                                        ],
                                        "policy": {
                                            "timeout": "0.12:00:00",
                                            "retry": 0,
                                            "retryIntervalInSeconds": 30,
                                            "secureOutput": false,
                                            "secureInput": false
                                        },
                                        "userProperties": [],
                                        "typeProperties": {
                                            "variableName": "tp",
                                            "value": {
                                                "value": "@{variables('path')}/@{activity('get latest').output.pipelineReturnValue.max_val}",
                                                "type": "Expression"
                                            }
                                        }
                                    },
                                    {
                                        "name": "update path",
                                        "type": "SetVariable",
                                        "dependsOn": [
                                            {
                                                "activity": "append max to path",
                                                "dependencyConditions": [
                                                    "Succeeded"
                                                ]
                                            }
                                        ],
                                        "policy": {
                                            "timeout": "0.12:00:00",
                                            "retry": 0,
                                            "retryIntervalInSeconds": 30,
                                            "secureOutput": false,
                                            "secureInput": false
                                        },
                                        "userProperties": [],
                                        "typeProperties": {
                                            "variableName": "path",
                                            "value": {
                                                "value": "@variables('tp')",
                                                "type": "Expression"
                                            }
                                        }
                                    }
                                ]
                            }
                        }
                    ],
                    "timeout": "0.12:00:00"
                }
            },
            {
                "name": "Set flag",
                "type": "SetVariable",
                "dependsOn": [
                    {
                        "activity": "Set path",
                        "dependencyConditions": [
                            "Succeeded"
                        ]
                    }
                ],
                "policy": {
                    "timeout": "0.12:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "variableName": "flag",
                    "value": {
                        "value": "false",
                        "type": "Expression"
                    }
                }
            },
            {
                "name": "Set path",
                "type": "SetVariable",
                "dependsOn": [],
                "policy": {
                    "timeout": "0.12:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "variableName": "path",
                    "value": {
                        "value": "data/f1/ff1",
                        "type": "Expression"
                    }
                }
            }
        ],
        "variables": {
            "path": {
                "type": "String"
            },
            "flag": {
                "type": "String"
            },
            "values": {
                "type": "Array"
            },
            "max_val": {
                "type": "String"
            },
            "tp": {
                "type": "String"
            }
        },
        "annotations": []
    }
}
  • 以下是pipeline 2的pipeline JSON:
{
    "name": "pipeline2",
    "properties": {
        "activities": [
            {
                "name": "make array of values",
                "type": "ForEach",
                "dependsOn": [],
                "userProperties": [],
                "typeProperties": {
                    "items": {
                        "value": "@pipeline().parameters.array_to_find_max",
                        "type": "Expression"
                    },
                    "isSequential": true,
                    "activities": [
                        {
                            "name": "append each value",
                            "type": "AppendVariable",
                            "dependsOn": [],
                            "userProperties": [],
                            "typeProperties": {
                                "variableName": "values",
                                "value": {
                                    "value": "@int(item().name)",
                                    "type": "Expression"
                                }
                            }
                        }
                    ]
                }
            },
            {
                "name": "return max",
                "type": "SetVariable",
                "dependsOn": [
                    {
                        "activity": "make array of values",
                        "dependencyConditions": [
                            "Succeeded"
                        ]
                    }
                ],
                "policy": {
                    "timeout": "0.12:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "variableName": "pipelineReturnValue",
                    "value": [
                        {
                            "key": "max_val",
                            "value": {
                                "type": "Expression",
                                "content": "@if(equals(length(string(max(variables('values')))),1),concat('0',string(max(variables('values')))),string(max(variables('values'))))"
                            }
                        }
                    ],
                    "setSystemVariable": true
                }
            }
        ],
        "parameters": {
            "array_to_find_max": {
                "type": "array",
                "defaultValue": [
                    {
                        "name": "2022",
                        "type": "Folder"
                    },
                    {
                        "name": "2023",
                        "type": "Folder"
                    }
                ]
            }
        },
        "variables": {
            "values": {
                "type": "Array"
            },
            "max_val": {
                "type": "String"
            },
            "tp": {
                "type": "String"
            }
        },
        "annotations": []
    }
}
  • 下面是我用于获取元数据活动的数据集配置。在我的例子中,path的初始值是data/f1/ff1,它的值将被更新(最大的文件夹名称将被连接):

  • 当我运行这个管道时,我得到了期望的结果。在until循环停止后,变量path具有所需的路径,即到最新数据的路径:

相关问题