以下代码拉低每日油价(dcoilwtico),将每日数据重新采样为每月数据,计算12个月(即年同比百分比)变化,最后包含一个循环,将年同比百分比变化向前移动1个月(dcoilwtico_1)、2个月(dcoilwtico_2),一直移动到12个月(dcoilwtico_12)作为新列:
import pandas_datareader as pdr
start = datetime.datetime (2016, 1, 1)
end = datetime.datetime (2022, 12, 1)
#1. Get historic data
df_fred_daily = pdr.DataReader(['DCOILWTICO'],'fred', start, end).dropna().resample('M').mean() # Pull daily, remove NaN and collapse from daily to monthly
df_fred_daily.columns= df_fred_daily.columns.str.lower()
#2. Expand df range: index, column names
index_fred = pd.date_range('2022-12-31', periods=13, freq='M')
columns_fred_daily = df_fred_daily.columns.to_list()
#3. Append history + empty df
df_fred_daily_forecast = pd.DataFrame(index=index_fred, columns=columns_fred_daily)
df_fred_test_daily=pd.concat([df_fred_daily, df_fred_daily_forecast])
#4. New df, calculate yoy percent change for each commodity
df_fred_test_daily_yoy= ((df_fred_test_daily - df_fred_test_daily.shift(12))/df_fred_test_daily.shift(12))*100
#5. Extend each variable as a series from 1 to 12 months
for col in df_fred_test_daily_yoy.columns:
for i in range(1,13):
df_fred_test_daily_yoy["%s_%s"%(col,i)] = df_fred_test_daily_yoy[col].shift(i)
df_fred_test_daily_yoy.tail(18)
并生成以下df:
问:我的真实的示例包含数百列,我希望使用Pyspark生成这些相同的结果。
用Pyspark怎么编码呢?
1条答案
按热度按时间xam8gpfp1#
由于你的代码已经准备好了,我会用考拉,“一个PandasSpark版”,你只需要安装https://pypi.org/project/koalas/
参见简单示例