azure 将json文件合并为一个文件,并在使用ADF合并数据之前在数据中添加文件名

gab6jxml  于 2023-04-12  发布在  其他
关注(0)|答案(1)|浏览(115)

我正在使用这种方法将我的各个json文件合并为一个文件,它可以工作:
使用ADF复制活动t:在source中使用通配符路径,在filename中使用 *。现在在sink中,使用合并选项文件合并到一个json blob中。
所有合并的数据在大JSON中看起来像这样:

{data from file1}
.
.
{data from file2}
.
.
{data from file3}

我们的要求是最终的格式如下所示:

{"File1.json":
[{
<file1 data>
}], 
"File2.json":
[{
<file2 data>
}]
}

是否可以使用ADF,如果不能,请建议替代方案。

ojsjcaue

ojsjcaue1#

您可以使用get metadata,for each,查找和复制数据活动来获得所需的结果。
1.首先使用getmetadata活动获取目录中的文件列表。
1.遍历这个get metadata活动返回的JSON文件。
1.现在使用一个附加变量activity将每个项附加为"file_name":[<file_data>]
1.在每个活动的外部,使用一个设置变量活动获取所需的字符串数据。
1.最后使用这个来构建所需的JSON。以下是完整的JSON流程:

{
    "name": "pipeline1",
    "properties": {
        "activities": [
            {
                "name": "Get Metadata1",
                "type": "GetMetadata",
                "dependsOn": [],
                "policy": {
                    "timeout": "0.12:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "dataset": {
                        "referenceName": "Json2",
                        "type": "DatasetReference"
                    },
                    "fieldList": [
                        "childItems"
                    ],
                    "storeSettings": {
                        "type": "AzureBlobFSReadSettings",
                        "enablePartitionDiscovery": false
                    },
                    "formatSettings": {
                        "type": "JsonReadSettings"
                    }
                }
            },
            {
                "name": "ForEach1",
                "type": "ForEach",
                "dependsOn": [
                    {
                        "activity": "Get Metadata1",
                        "dependencyConditions": [
                            "Succeeded"
                        ]
                    }
                ],
                "userProperties": [],
                "typeProperties": {
                    "items": {
                        "value": "@activity('Get Metadata1').output.childItems",
                        "type": "Expression"
                    },
                    "isSequential": true,
                    "activities": [
                        {
                            "name": "Lookup1",
                            "type": "Lookup",
                            "dependsOn": [],
                            "policy": {
                                "timeout": "0.12:00:00",
                                "retry": 0,
                                "retryIntervalInSeconds": 30,
                                "secureOutput": false,
                                "secureInput": false
                            },
                            "userProperties": [],
                            "typeProperties": {
                                "source": {
                                    "type": "JsonSource",
                                    "storeSettings": {
                                        "type": "AzureBlobFSReadSettings",
                                        "recursive": true,
                                        "enablePartitionDiscovery": false
                                    },
                                    "formatSettings": {
                                        "type": "JsonReadSettings"
                                    }
                                },
                                "dataset": {
                                    "referenceName": "Json3",
                                    "type": "DatasetReference"
                                },
                                "firstRowOnly": false
                            }
                        },
                        {
                            "name": "Append variable1",
                            "type": "AppendVariable",
                            "dependsOn": [
                                {
                                    "activity": "Lookup1",
                                    "dependencyConditions": [
                                        "Succeeded"
                                    ]
                                }
                            ],
                            "userProperties": [],
                            "typeProperties": {
                                "variableName": "final",
                                "value": {
                                    "value": "\"@{item().name}\":@{activity('Lookup1').output.value}",
                                    "type": "Expression"
                                }
                            }
                        }
                    ]
                }
            },
            {
                "name": "Set variable1",
                "type": "SetVariable",
                "dependsOn": [
                    {
                        "activity": "ForEach1",
                        "dependencyConditions": [
                            "Succeeded"
                        ]
                    }
                ],
                "userProperties": [],
                "typeProperties": {
                    "variableName": "required",
                    "value": {
                        "value": "{@{join(variables('final'),',')}}",
                        "type": "Expression"
                    }
                }
            },
            {
                "name": "Copy data1",
                "type": "Copy",
                "dependsOn": [
                    {
                        "activity": "Set variable1",
                        "dependencyConditions": [
                            "Succeeded"
                        ]
                    }
                ],
                "policy": {
                    "timeout": "0.12:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "source": {
                        "type": "DelimitedTextSource",
                        "additionalColumns": [
                            {
                                "name": "required",
                                "value": {
                                    "value": "@variables('required')",
                                    "type": "Expression"
                                }
                            }
                        ],
                        "storeSettings": {
                            "type": "AzureBlobFSReadSettings",
                            "recursive": true,
                            "enablePartitionDiscovery": false
                        },
                        "formatSettings": {
                            "type": "DelimitedTextReadSettings"
                        }
                    },
                    "sink": {
                        "type": "DelimitedTextSink",
                        "storeSettings": {
                            "type": "AzureBlobFSWriteSettings"
                        },
                        "formatSettings": {
                            "type": "DelimitedTextWriteSettings",
                            "quoteAllText": true,
                            "fileExtension": ".txt"
                        }
                    },
                    "enableStaging": false,
                    "translator": {
                        "type": "TabularTranslator",
                        "mappings": [
                            {
                                "source": {
                                    "name": "required",
                                    "type": "String"
                                },
                                "sink": {
                                    "type": "String",
                                    "physicalType": "String",
                                    "ordinal": 1
                                }
                            }
                        ],
                        "typeConversion": true,
                        "typeConversionSettings": {
                            "allowDataTruncation": true,
                            "treatBooleanAsNumber": false
                        }
                    }
                },
                "inputs": [
                    {
                        "referenceName": "csv1",
                        "type": "DatasetReference"
                    }
                ],
                "outputs": [
                    {
                        "referenceName": "DelimitedText1",
                        "type": "DatasetReference"
                    }
                ]
            }
        ],
        "variables": {
            "final": {
                "type": "Array"
            },
            "required": {
                "type": "String"
            }
        },
        "annotations": []
    }
}
  • 在最后一个复制数据活动中,将source作为带分隔符的文本,其中包含任何示例数据(1列和1行)。创建一个附加列,其中value作为设置变量。

  • 现在,对于汇数据集,创建一个分隔的文本数据集,其配置与下图所示相同:

  • 运行管道将给予所需的结果。以下是运行上述管道后的数据。

{
   "sample1.json":[
      {
         "KTYPE":[
            {
               "name":"john",
               "surname":"elo"
            },
            {
               "name":"dd",
               "surname":"ss"
            }
         ],
         "MTYPE":[
            {
               "name":"dsdsd",
               "id":"elo"
            },
            {
               "name":"sdss",
               "id":"sds22"
            }
         ]
      }
   ],
   "sample2.json":[
      {
         "name":"Привет"
      },
      {
         "name":"Привет"
      }
   ],
   "sample3.json":[
      {
         "id":1,
         "first_name":"Catlin",
         "last_name":"Haysman",
         "email":"chaysman0@hostgator.com",
         "gender":"Female",
         "ip_address":"80.243.124.118"
      },
      {
         "id":2,
         "first_name":"Augustin",
         "last_name":"Nesbeth",
         "email":"anesbeth1@cbc.ca",
         "gender":"Male",
         "ip_address":"250.126.164.4"
      }
   ],
   "sample4.json":[
      {
         "id":3,
         "first_name":"Layla",
         "last_name":"Morant",
         "email":"lmorant2@seattletimes.com",
         "gender":"Female",
         "ip_address":"247.73.128.196"
      },
      {
         "id":4,
         "first_name":"Ophelie",
         "last_name":"Rape",
         "email":"orape3@bloomberg.com",
         "gender":"Female",
         "ip_address":"148.213.192.8"
      }
   ]
}

相关问题