pandas 在现有 Dataframe 中追加移位列的有效方法

vbkedwbf  于 2023-01-11  发布在  其他
关注(0)|答案(1)|浏览(177)

我正尝试按照以下代码在 Dataframe 中添加移位:

import pandas as pd 
# list of strings 
lst1 = range(10000000)
df = pd.DataFrame(list(zip(lst1)), columns =['mw']) 
print(df)
lagSize=365
for col in df.columns:
    for i in range(1, lagSize + 1):
        df["%s_%s_%s" % (col, i, -1)] = df[col].shift(i)

我收到以下警告:

PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df["%s_%s_%s" % (col, i, -1)] = df[col].shift(i)

做以下操作的好方法是什么?

5lhxktic

5lhxktic1#

命名列

如果你还想为每个班次保留一个键,这样你就知道它是什么了,你可以把它们保存在一个字典里,然后从字典中创建一个新的 Dataframe 。

import pandas as pd

# list of strings 
lst1 = range(10000000)
df = pd.DataFrame(list(zip(lst1)), columns =['mw']) 
print(df)
lagSize=365

shifts = {}
for col in df.columns:
  for i in range(1, lagSize + 1):
    shifts['key' + i] = df[col].shift(i)

dd = pd.DataFrame.from_dict(shifts)
print(dd)

未命名列

如果你保留一个列表,那么创建下一个 Dataframe 就更容易了。

import pandas as pd

# list of strings 
shiftsList = []
for col in df.columns:
  for i in range(1, lagSize + 1):
    shiftsList.append(df[col].shift(i))

dl = pd.DataFrame(shiftsList)
print(dl)

创建移位后的Dataframe后,添加原始的1列比添加每列要快得多。

相关问题