使用Python boto3从S3阅读JSON文件

34gzjxbg 于 2023-03-21 发布在 Python

关注(0)|答案(7)|浏览(181)

我一直在S3 bucket 'test'中跟踪JSON

{
  'Details' : "Something" 
}

我使用下面的代码来读取这个JSON并打印关键字'Details'

s3 = boto3.resource('s3',
                    aws_access_key_id=<access_key>,
                    aws_secret_access_key=<secret_key>
                    )
content_object = s3.Object('test', 'sample_json.txt')
file_content = content_object.get()['Body'].read().decode('utf-8')
json_content = json.loads(repr(file_content))
print(json_content['Details'])

我得到错误**'字符串索引必须是整数'**我不想从S3下载文件，然后阅读..

python

来源：https://stackoverflow.com/questions/40995251/reading-an-json-file-from-s3-using-python-boto3

7条答案

按热度按时间

b4qexyjb1#

正如上面的注解中提到的，repr必须被删除，json文件必须使用双引号作为属性。在aws/s3上使用这个文件：

{
  "Details" : "Something"
}

下面的Python代码，它可以工作：

import boto3
import json

s3 = boto3.resource('s3')

content_object = s3.Object('test', 'sample_json.txt')
file_content = content_object.get()['Body'].read().decode('utf-8')
json_content = json.loads(file_content)
print(json_content['Details'])
# >> Something

赞(0）回复(0）举报 2023-03-21

dwthyt8l2#

下面这些对我很有效。

# read_s3.py

from boto3 import client

BUCKET = 'MY_S3_BUCKET_NAME'
FILE_TO_READ = 'FOLDER_NAME/my_file.json'
client = client('s3',
                 aws_access_key_id='MY_AWS_KEY_ID',
                 aws_secret_access_key='MY_AWS_SECRET_ACCESS_KEY'
                )
result = client.get_object(Bucket=BUCKET, Key=FILE_TO_READ) 
text = result["Body"].read().decode()
print(text['Details']) # Use your desired JSON Key for your value

进一步改善

让我们将上面的代码片段称为read_s3.py。
直接对AWS Id和Secret Keys进行硬编码并不是一个好主意。对于最佳实践，您可以考虑以下任一项：
(1)从存储在本地存储中的json文件（aws_cred.json）读取您的AWS凭据：

from json import load
from boto3 import client
...
credentials = load(open('local_fold/aws_cred.json'))
client = client('s3',
                 aws_access_key_id=credentials['MY_AWS_KEY_ID'],
                 aws_secret_access_key=credentials['MY_AWS_SECRET_ACCESS_KEY']
                )

(2)从你的环境变量中读取（我的首选部署选项）：

from os import environ
    client = boto3.client('s3',              
                         aws_access_key_id=environ['MY_AWS_KEY_ID'],
                           aws_secret_access_key=environ['MY_AWS_SECRET_ACCESS_KEY']
                         )

让我们准备一个名为read_s3_using_env.sh的shell脚本来设置环境变量，并添加我们的python脚本（read_s3.py），如下所示：

# read_s3_using_env.sh
export MY_AWS_KEY_ID='YOUR_AWS_ACCESS_KEY_ID'
export MY_AWS_SECRET_ACCESS_KEY='YOUR_AWS_SECRET_ACCESS_KEY'
# execute the python file containing your code as stated above that reads from s3
python read_s3.py # will execute the python script to read from s3

现在在终端中执行shell脚本，如下所示：

sh read_s3_using_env.sh

赞(0）回复(0）举报 2023-03-21

gwo2fgha3#

想要补充的是，botocore.response.streamingbody与json.load配合良好：

import json
import boto3

s3 = boto3.resource('s3')

obj = s3.Object(bucket, key)
data = json.load(obj.get()['Body'])

赞(0）回复(0）举报 2023-03-21

nzrxty8p4#

您可以在AWS Lambda中使用以下代码从S3存储桶读取JSON文件，并使用python对其进行处理。

import json
import boto3
import sys
import logging

# logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

VERSION = 1.0

s3 = boto3.client('s3')

def lambda_handler(event, context):
    bucket = 'my_project_bucket'
    key = 'sample_payload.json'
    
    response = s3.get_object(Bucket = bucket, Key = key)
    content = response['Body']
    jsonObject = json.loads(content.read())
    print(jsonObject)

赞(0）回复(0）举报 2023-03-21

44u64gxh5#

我被卡住了一点，因为解码对我不起作用（s3对象是gzip压缩的）。
找到了这个讨论，帮助我：Python gzip: is there a way to decompress from a string?

import boto3
import zlib

key = event["Records"][0]["s3"]["object"]["key"]
bucket_name = event["Records"][0]["s3"]["bucket"]["name"]

s3_object = S3_RESOURCE.Object(bucket_name, key).get()['Body'].read()

jsonData = zlib.decompress(s3_object, 16+zlib.MAX_WBITS)

如果你打印jsonData，你会看到你想要的JSON文件！如果你在AWS本身运行测试，一定要检查CloudWatch日志，因为在lambda中，如果太长，它不会输出完整的JSON文件。

赞(0）回复(0）举报 2023-03-21

xlpyo6sf6#

使用cloudpathlib很容易做到这一点，它支持S3以及Google Cloud Storage和Azure Blob Storage。下面是一个示例：

import json
from cloudpathlib import CloudPath

# first, we'll write some json data so then we can later read it
CloudPath("s3://mybucket/asdf.json").write_text('{"field": "value"}')
#> 18

# read data from S3
data = json.loads(
    CloudPath("s3://mybucket/asdf.json").read_text()
)

# look at the data
data
#> {'field': 'value'}

# access it now that it is loaded in Python
data["field"] == "value"
#> True

这在设置particular options或different authentication mechanisms或保留persistent cache方面带来了一些额外的好处，因此您不必总是从S3重新下载。

赞(0）回复(0）举报 2023-03-21

qxgroojn7#

如果你的json文件看起来像这样：

{
    "test": "test123"
}

你可以像dict这样访问它：

BUCKET="Bucket123"

def get_json_from_s3(key: str):
    """
    Retrieves the json file containing responses from s3. returns a dict

    Args:
        key (str): file path to the json file

    Returns:
        dict: json style dict
    """
    data = client.get_object(Bucket=BUCKET, Key=key)
    json_text = data["Body"].read().decode("utf-8")
    json_text_object = json.loads(json_text)
    return json_text_object
test_dict = get_json_from_s3(key="test.json")
print(test_dict["test"])

赞(0）回复(0）举报 2023-03-21

我来回答

使用Python boto3从S3阅读JSON文件

7条答案

相关问题

热门标签

最新问答