我不时地从AWS SNS接收文件(json)。其中一些只包含1个对象,另一些有几个对象。
我尝试运行一个python脚本来提取几个列作为示例,但是由于这几个对象或文件格式,我得到了多个错误。理想情况下,我希望将所有数据放入列中,特别是“Message”部分中包含的列。(即eventType、from、to、bcc、messageId、timestamp等。
有人能帮忙吗?谢谢
脚本:
data = []
for line in open(folder + file, 'r', encoding='utf-8'):
data.append(json.loads(line))
pd.json_normalize(data)
#df = pd.DataFrame(((d['Message']) for d in data), columns=['Message'])
df = pd.DataFrame([(
data[0]['Timestamp'],
data[0]['Subject'],
data[0]['Message']['eventType'])
], columns=['Timestamp', 'Subject', 'eventType'])
df.to_csv(folder + 'testOutput.csv', index=False, encoding='utf-8')
print(df)
包含多个对象的示例文件:
{"Type":"Notification","MessageId":"0579da9d-671f-547f-879a-8151d6048","TopicArn":"arn:aws:sns:region:123456789:SES-notification-s","Subject":"Event Notification","Message":"{\"eventType\":\"Click\",\"mail\":{\"timestamp\":\"2022-10-21T14:39:47.003Z\",\"source\":\"tester <fromemail@email.com>\",\"sendingAccountId\":\"123456789\",\"messageId\":\"01000183fafc5f3b-4f5f3dc2-281a-472a-85-915f85289858-000000\",\"destination\":[\"toemail@email.com\",\"bccemail@email.com\"],\"headersTruncated\":false,\"headers\":[{\"name\":\"Content-Type\",\"value\":\"multipart/mixed; boundary=\\\"===============4453571503606487627==\\\"\"},{\"name\":\"MIME-Version\",\"value\":\"1.0\"},{\"name\":\"Subject\",\"value\":\"Email tracking test\"},{\"name\":\"From\",\"value\":\"tester <fromemail@email.com>\"},{\"name\":\"To\",\"value\":\"toemail@email.com\"},{\"name\":\"Bcc\",\"value\":\"bccemail@email.com\"},{\"name\":\"Message-ID\",\"value\":\"null\"}],\"commonHeaders\":{\"from\":[\"tester <fromemail@email.com>\"],\"to\":[\"toemail@email.com\"],\"bcc\":[\"bccemail@email.com\"],\"messageId\":\"010001fafc5f33b-4f53dc2-281a-472a-8f65-915f889858-000000\",\"subject\":\"Email tracking test\"},\"tags\":{\"ses:operation\":[\"SendRawEmail\"],\"ses:configuration-set\":[\"ses-email-tracking\"],\"ses:source-ip\":[\"12.12.123.123\"],\"ses:from-domain\":[\"domain.dev\"],\"ses:caller-identity\":[\"caller_identity\"]}},\"click\":{\"timestamp\":\"2022-10-21T14:40:55.000Z\",\"ipAddress\":\"66.66.666.666\",\"userAgent\":\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36\",\"link\":\"https://website.com\",\"linkTags\":null}}\n","Timestamp":"2022-10-21T14:40:55.066Z","UnsubscribeURL":"unsubscribeLink"}
{"Type":"Notification","MessageId":"03758739-75a9-5462-8361-dcf5c410e015","TopicArn":"arn:aws:sns:region:123456789:SES-notification-s","Subject":"Event Notification","Message":"{\"eventType\":\"ClickNum2\",\"mail\":{\"timestamp\":\"2022-10-21T14:39:47.003Z\",\"source\":\"tester <fromemail@email.com>\",\"sendingAccountId\":\"123456789\",\"messageId\":\"01000183fafc5f3b-4f5f3dc2-281a-472a-85-915f85289858-000000\",\"destination\":[\"toemail2@email.com\",\"bccemail2@email.com\"],\"headersTruncated\":false,\"headers\":[{\"name\":\"Content-Type\",\"value\":\"multipart/mixed; boundary=\\\"===============4453571503606487627==\\\"\"},{\"name\":\"MIME-Version\",\"value\":\"1.0\"},{\"name\":\"Subject\",\"value\":\"Email tracking test\"},{\"name\":\"From\",\"value\":\"tester <fromemail@email.com>\"},{\"name\":\"To\",\"value\":\"toemail2@email.com\"},{\"name\":\"Bcc\",\"value\":\"bccemail2@email.com\"},{\"name\":\"Message-ID\",\"value\":\"null\"}],\"commonHeaders\":{\"from\":[\"tester <fromemail@email.com>\"],\"to\":[\"toemail2@email.com\"],\"bcc\":[\"bccemail2@email.com\"],\"messageId\":\"010001fafc5f33b-4f53dc2-281a-472a-8f65-915f889858-000000\",\"subject\":\"Email tracking test\"},\"tags\":{\"ses:operation\":[\"SendRawEmail\"],\"ses:configuration-set\":[\"ses-email-tracking\"],\"ses:source-ip\":[\"12.12.123.123\"],\"ses:from-domain\":[\"domain.dev\"],\"ses:caller-identity\":[\"caller_identity\"]}},\"click2\":{\"timestamp\":\"2022-10-21T15:45:50.000Z\",\"ipAddress\":\"55.55.555.555\",\"userAgent\":\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36\",\"link\":\"https://website.com\",\"linkTags\":null}}\n","Timestamp":"2022-10-21T15:50:50.066Z","UnsubscribeURL":"unsubscribeLink"}
{"Type":"Notification","MessageId":"03758739-75a9-5462-8361-dcf5c410e015","TopicArn":"arn:aws:sns:region:123456789:SES-notification-s","Subject":"Event Notification","Message":"{\"eventType\":\"ClickNum3\",\"mail\":{\"timestamp\":\"2022-10-21T14:39:47.003Z\",\"source\":\"tester <fromemail@email.com>\",\"sendingAccountId\":\"123456789\",\"messageId\":\"01000183fafc5f3b-4f5f3dc2-281a-472a-85-915f85289858-000000\",\"destination\":[\"toemail3@email.com\",\"bccemail3@email.com\"],\"headersTruncated\":false,\"headers\":[{\"name\":\"Content-Type\",\"value\":\"multipart/mixed; boundary=\\\"===============4453571503606487627==\\\"\"},{\"name\":\"MIME-Version\",\"value\":\"1.0\"},{\"name\":\"Subject\",\"value\":\"Email tracking test\"},{\"name\":\"From\",\"value\":\"tester <fromemail@email.com>\"},{\"name\":\"To\",\"value\":\"toemail3@email.com\"},{\"name\":\"Bcc\",\"value\":\"bccemail3@email.com\"},{\"name\":\"Message-ID\",\"value\":\"null\"}],\"commonHeaders\":{\"from\":[\"tester <fromemail@email.com>\"],\"to\":[\"toemail3@email.com\"],\"bcc\":[\"bccemail3@email.com\"],\"messageId\":\"010001fafc5f33b-4f53dc2-281a-472a-8f65-915f889858-000000\",\"subject\":\"Email tracking test\"},\"tags\":{\"ses:operation\":[\"SendRawEmail\"],\"ses:configuration-set\":[\"ses-email-tracking\"],\"ses:source-ip\":[\"12.12.123.123\"],\"ses:from-domain\":[\"domain.dev\"],\"ses:caller-identity\":[\"caller_identity\"]}},\"click3\":{\"timestamp\":\"2022-10-21T16:50:50.000Z\",\"ipAddress\":\"44.44.444.444\",\"userAgent\":\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36\",\"link\":\"https://website.com\",\"linkTags\":null}}\n","Timestamp":"2022-10-21T16:50:50.066Z","UnsubscribeURL":"unsubscribeLink"}
2条答案
按热度按时间ie3xauqp1#
您可以尝试将json文件中的所有字段包含到csv文件中。
bqucvtff2#
如何解析整个输入:
然后
并选择所需的列?