raw_data_input_path = "s3a://{}/logs/application_id={}/component_id={}/".format(s3BucketName, application_id, component_id)
df = glueContext.create_dynamic_frame_from_options(connection_type="s3",
connection_options={"paths": [raw_data_input_path],
"recurse": True},
format="json",
transformation_ctx=dbInstance)
我的存储桶键包含10个json文件1个txt文件,我只想在动态帧中包含json文件。这是create_dynamic_frame_from_options中的'format'参数的作用吗
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-glue-context.html#aws-glue-api-crawler-pyspark-extensions-glue-context-create_dynamic_frame_from_options
“format -格式规范(可选)。用于支持多种格式的Amazon S3或AWS Glue连接。”
1条答案
按热度按时间col17t5w1#
exclusions
参数将帮助您排除connection_options
对象www.example.com上的文件https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect.html#aws-glue-programming-etl-connect-s3