在带有Csv的Python3中使用Gzip

2j4z5cfb 于 2023-02-27 发布在 Python

关注(0)|答案(1)|浏览(153)

目标是创建python2.7和〉= python3.6兼容的代码。
这段代码目前运行在python2.7上，它创建了一个GzipFile对象，然后将列表写入gzip文件，最后将gzip文件上传到一个s3 bucket。
起始日期：

data = [ [1, 2, 3], [4, 5, 6], ["a", 3, "iamastring"] ]

我试过：

def get_gzip_writer(path):
  with s3_reader.open(path) as s3_file:
    with gzip.GzipFile(fileobj=s3_file, mode="w") as gzip_file:
      yield csv.writer(gzip_file)

然而，这段代码在python3上不起作用，因为csv给出的是str，而gzip需要的是bytes。考虑到后面使用/读取gzip的方式，保持gzip的字节数很重要。这意味着使用io.TextIOWrapper在这个特定用例中不起作用。
我尝试创建一个适配器类。

class BytesToBytes(object):
  def __init__(self, stream, dialect, encoding, **kwargs):
    self.temp = six.StringIO()
    self.writer = csv.writer(self.temp, dialect, **kwargs)
    self.stream = stream
    self.encoding = encoding
  
  def writerow(self, row):
    self.writer.writerow([s.decode('utf-8') if hasattr(s, 'decode') else s for s in row])
    self.stream.write(six.ensure_binary(self.temp.getvalue(), encoding))
    self.temp.seek(0)
    self.temp.truncate(0)

更新后的代码如下所示：

def get_gzip_writer(path):
  with s3_reader.open(path) as s3_file:
    with gzip.GzipFile(fileobj=s3_file, mode="w") as gzip_file:
      yield BytesToBytes(gzip_file)

这是可行的，但是为了这个单一用例的目的而拥有一个完整的类似乎有些过分。
这是调用上述内容的代码：

def write_data(data, url):
  with get_gzip_writer(url) as writer:
    for row in data:
      writer.writerow(row)
  return url

在不创建整个适配器类的情况下使用GzipFile（同时保持读/写字节）有哪些选项可用？

csv

来源：https://stackoverflow.com/questions/70933445/using-gzip-in-python3-with-csv

1条答案

按热度按时间

rta7y2nd1#

我已经阅读并考虑了您对将GZip文件保持为二进制模式的关注，我认为您仍然可以使用TextIOWrapper。我的理解是，它的工作是提供一个从文本写入字节的接口（我强调的是）：
提供对BufferedIOBase缓冲的*二进制流*的高级访问的缓冲文本流。
我将其解释为“文本输入，字节输出”......这正是GZip应用程序所需要的，对吗？如果是这样，那么对于Python3，我们需要为CSV编写器提供接受字符串但最终写入字节的东西。
使用UTF-8编码输入TextIOWrapper，接受来自csv.writer的writerow/s()方法的字符串，并将UTF-8编码的字节写入gzip_file。
我已经在Python 2和3中运行了这个代码，解压缩了文件，看起来不错：

import csv, gzip, io, six

def get_gzip_writer(path):
  with open(path, 'wb') as s3_file:
    with gzip.GzipFile(fileobj=s3_file, mode='wb') as gzip_file:
        if six.PY3:
            with io.TextIOWrapper(gzip_file, encoding='utf-8') as wrapper:
                yield csv.writer(wrapper)
        elif six.PY2:
            yield csv.writer(gzip_file)
        else:
            raise ValueError('Neither Python2 or 3?!')

data = [[1,2,3],['a','b','c']]
url = 'output.gz'

for writer in get_gzip_writer(url):
    for row in data:
        writer.writerow(row)

赞(0）回复(0）举报 2023-02-27

我来回答

在带有Csv的Python3中使用Gzip

1条答案

相关问题

热门标签

最新问答