使用pymongo将MongoDB导出为CSV

我想写一个脚本从我的mongoDB数据库生成CSV文件，我想知道最方便的版本！
首先让我开始与集合的结构。

MyDataBase -> setting
              users
              fruits

字符串
在设置中，

setting -> _id
           data
           _tenant

型
而我的事情后，是使CSV文件的配置文件在 * 数据 *，他们有一些字段/属性，如“姓名”，“地址”，“邮政编码”，“电子邮件”，年龄等，而不是necessary所有这些配置文件有所有的文件/属性，甚至其中一些看起来像集合（有子分支），我不感兴趣在所有！
所以，到目前为止，我Python代码看起来像这样

myquery = db.settings.find() # I am getting everything !
output = csv.writer(open('some.csv', 'wt')) # writng in this file

for items in myquery[0:10]: # first 11 entries
    a = list(items['data']['Profile'].values()) # collections are importent as dictionary and I am making them as list
    tt = list()
    for chiz in a:
        if chiz is not None:
            tt.append(chiz.encode('ascii', 'ignore')) #encoding
        else:
            tt.append("none")
    output.writerow(tt)

型
这些字段/属性没有necessary所有字段，甚至其中一些是集合（与子分支），将被导入为字典！所以，我必须将它们转换为列表和所有，有相当少的事情要照顾在这样一个过程中，在所有看起来不那么简单！
这是一种典型的报告方式吗？如果不是，有人能说清楚吗？

当处理嵌套结构或缺少字段时，将数据从MongoDB导出到CSV可能有点棘手，但使用Python和pymongo，您可以通过一些额外的逻辑来处理这些情况。下面是如何创建一个脚本来将数据从MongoDB中的设置集合导出到CSV文件，处理缺少的字段和嵌套结构：
1.连接到MongoDB数据库并访问集合。
1.循环访问集合中的文档。
1.展开嵌套的JSON（如有必要）以从配置文件中获取所需的字段。
1.通过在访问之前检查字段是否存在来处理丢失的字段。
1.将行写入CSV文件。
下面是一个更健壮的脚本来处理所描述的场景：

import csv
from pymongo import MongoClient

# Function to flatten nested dictionaries for CSV output
def flatten_json(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

# Establish a connection to the MongoDB database
client = MongoClient('mongodb://localhost:27017/')
db = client['MyDataBase']  # Replace with your database name
collection = db['settings']  # Replace with your collection name

# Specify the CSV file to write to
csv_file_path = 'profiles.csv'

# Define the header fields that you expect to have
headers = ["name", "address_street", "address_city", "postalcode", "email", "age"]

# Open the CSV file for writing
with open(csv_file_path, 'w', newline='', encoding='utf-8') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=headers)
    
    # Write the headers to the CSV file
    writer.writeheader()
    
    # Query the database for documents
    for setting in collection.find():
        # Flatten the document for CSV output and handle missing fields
        flat_profile = flatten_json(setting.get('data', {}).get('Profile', {}))

        # Ensure only the expected headers/columns are written to CSV
        row = {header: flat_profile.get(header, "none") for header in headers}
        
        # Write the row to the CSV file
        writer.writerow(row)

print(f'Data exported to {csv_file_path}')

字符串
此脚本将使用指定的标题创建CSV文件。对于设置集合中的每个文档，它将尝试从数据字段中提取Profile子文档。如果Profile中不存在任何指定的标题，则它将为该字段写入“none”。如果字段存在但包含嵌套结构（另一个文档），则它将写入“nested_structure”作为占位符。
您可以根据需要优化占位符值和处理嵌套结构的方法。str（value）.encode（'asclave'，'asclave'）.decode（'asclave'）行用于将值转换为字符串，同时忽略任何非ASCII字符，如果您的数据已经是干净的并且是ASCII编码的，则可能不需要这样做。

使用pymongo将MongoDB导出为CSV

1条答案

相关问题

热门标签

最新问答