python 从2D数组创建Dataframe时的内存消耗

qni6mghb 于 2023-05-16 发布在 Python

关注(0)|答案(1)|浏览(95)

我有一个很大的文件，我必须在一个 Dataframe 转换。该文件每行有1个项目，我必须将其转换为 Dataframe 。为此，我读取了一个2D数组，每次遇到-99999时，我都必须更改2D数组的行。
当我创建dataframe时，如果文件太大，我会遇到内存问题：

内存错误：无法为形状为（2735，555873）* 的数组分配11.3GiB

它尝试分配11.3GiB的额外空间，但从我在内存消耗中看到的情况来看，它至少已经使用了相同的数量。
有什么关于如何提高内存使用率的建议吗？

temperature_matrix = []  # All the temperature, this serves as data for the dataframe
        time_array = []  # The array of temperature, this serves as index for the dataframe
        elements = []  # The array of elements, this serves as columns title for the dataframe

        with open(self.file_location, "r") as f:
            timestep = -1  # no of time steps

            for line in f:
                line = line.strip()
                line_data = line.split()

                if line_data[0] == "-99999":  # Detecting a new time step, we skip
                    time_array.append(float(line_data[1]))
                    timestep = timestep + 1
                    temperature_matrix.append([])
                    continue

                if timestep == 0:  # For the first timestep we store the element ids
                    elements.append(int(line_data[0]))

                temperature_matrix[timestep].append(float(line_data[1]))

        df = pd.DataFrame(data=temperature_matrix, index=time_array, columns=elements)

参见输入文件示例：

-99999 0.0  # Time step 0
125 25.774447 # Temperature of element 125 at time step 0
126 35.774447
127 45.774447
128 55.774447
...
-99999 60.0  # Time step 60.0
125 25.774447 # Temperature of element 125 at time step 60
126 35.774447
127 45.774447
128 55.774447
...

我尝试对浮点数进行舍入，但内存消耗是相同的。

python

来源：https://stackoverflow.com/questions/76254058/memory-consumption-when-creating-a-dataframe-from-a-2d-array