python-3.x 生成零字符串的最快方法

qij5mzcb  于 2023-01-14  发布在  Python
关注(0)|答案(2)|浏览(129)

我需要生成一些零字符串,例如:

import sys
MB = 1024 * 1024
cache = ''
while sys.getsizeof(cache) <= 10 * MB:
    cache = cache + "0"

并将其保存到文件中,但我的印象是这种方法太慢,而且浪费了大量的系统资源。
什么是最好的方法来尽可能快地做到这一点?

1wnzp6jl

1wnzp6jl1#

可以“乘”字符串:

cache = '0' * (1024**2)

接收超过一百万次0。任何其他字符串或其他整数作为因子也可以工作。

ecfdbz9o

ecfdbz9o2#

有很多方法可以做到这一点。为了找出哪一种最快,让我们尝试很多方法,并使用timeit进行测量。
注意,这段代码有点草率,如果所需长度不是2的幂,可能无法生成正确的长度,因此需要整理一下。

import timeit
from io import StringIO

target_size = 2**24
starting_char = '0'
num_iters = 1000

def single_char_multiplied():
    return starting_char * target_size
    
def single_char_join():
    return ''.join(starting_char for _ in range(target_size))
    
def single_char_power():
    s = starting_char
    while len(s) < target_size:
        s *= 2
    s = s[:target_size]
    return s
    
def chunk_join(chunk_size):
    chunk = starting_char * chunk_size
    # this is not exact,
    # need to tweak if target_size is not a power of two
    num_chunks = target_size // chunk_size
    return ''.join(chunk for _ in range(num_chunks))
    
def stringio_single_append():
    with StringIO() as f:
        for _ in range(target_size):
            f.write(starting_char)
        return f.getvalue()

def stringio_chunk_append(chunk_size):
    chunk = starting_char * chunk_size
    with StringIO() as f:
        while f.tell() < target_size:
            f.write(chunk)
        return f.getvalue()
    
def stringio_doubling():
    with StringIO(starting_char) as f:
        while f.tell() < target_size:
            f.write(f.getvalue())
        return f.getvalue()
    
def dev_zero_single_read():
    with open('/dev/zero', 'r') as f:
        return f.read(target_size)

approaches = [
    [single_char_multiplied, 'single_char_multiplied'],
    [single_char_join, 'single_char_join'],
    [single_char_power, 'single_char_power'],
    [stringio_single_append, 'stringio_single_append'],
    [stringio_doubling, 'stringio_doubling'],
    [dev_zero_single_read, 'dev_zero_single_read'],
]
for chunk_size in [10, 100, 1000, 10000, 100000]:
    approaches.append([lambda: chunk_join(chunk_size), f"chunk_join({chunk_size})"])
    approaches.append([lambda: stringio_chunk_append(chunk_size), f"stringio_chunk_append({chunk_size})"])

for (i,approach) in enumerate(approaches):
    result = timeit.timeit(approach[0], number=num_iters)
    approach.append(result)
    print(f"{i}/{len(approaches)}: {approach[1]}: {result}")
    
approaches.sort(key=lambda a: a[-1])
print("Sorted results:")
for func, func_name, result in approaches:
    print(f"{func_name}: {result}")

(In事后看来,这些数据块应该是2的幂。)
这给出:

single_char_multiplied: 1.4196025500000076
chunk_join(1000): 1.976723690999279
chunk_join(10000): 1.978875980000339
chunk_join(100000): 2.0014372969999386
chunk_join(10): 2.003043951000109
stringio_chunk_append(1000): 2.0336110369999005
stringio_chunk_append(100000): 2.038408315000197
stringio_chunk_append(10): 2.0456108839998706
chunk_join(100): 2.0504061949995958
stringio_chunk_append(100): 2.177647779999461
stringio_chunk_append(10000): 2.2308024960002513
single_char_power: 30.150350827999773
dev_zero_single_read: 32.01321319700037
stringio_doubling: 118.23563569500038
single_char_join: 2267.945749295
stringio_single_append: 3360.535176466

令人惊讶的是,看起来最快的方法就是最简单的方法。就像@klaus-d的回答一样,'0' * n

相关问题