有人能帮我用这个功能制作假人吗:
def make_dummies(df):
# Create dummies for all hours of the day
hours = pd.get_dummies(df.index.hour, prefix='hour')
# Create columns for hour, day of week, weekend, and month
df['hour'] = df.index.strftime('%H')
df['day_of_week'] = df.index.dayofweek
df['weekend'] = np.where(df['day_of_week'].isin([5,6]), 1, 0)
df['month'] = df.index.month
# Create dummies for hours of the day
hour_dummies = pd.get_dummies(df['hour'], prefix='hour')
# Create dummies for all days of the week
day_mapping = {0: 'monday', 1: 'tuesday', 2: 'wednesday', 3: 'thursday', 4: 'friday', 5: 'saturday', 6: 'sunday'}
all_days = pd.Categorical(df['day_of_week'].map(day_mapping), categories=day_mapping.values())
day_dummies = pd.get_dummies(all_days)
# Create dummies for all months of the year
month_mapping = {1: 'jan', 2: 'feb', 3: 'mar', 4: 'apr', 5: 'may', 6: 'jun', 7: 'jul',
8: 'aug', 9: 'sep', 10: 'oct', 11: 'nov', 12: 'dec'}
all_months = pd.Categorical(df['month'].map(month_mapping), categories=month_mapping.values())
month_dummies = pd.get_dummies(all_months)
# Merge all dummies with original DataFrame
df = pd.concat([df, hours, hour_dummies, day_dummies, month_dummies], axis=1)
# Drop redundant columns
df = df.drop(['hour', 'day_of_week', 'month'], axis=1)
return df
在这样的小数据集上:
import pandas as pd
import numpy as np
data = {"temp":[53.13,52.93,52.56,51.58,47.57],
"Date":["2023-04-07 15:00:00-05:00","2023-04-07 16:00:00-05:00","2023-04-07 17:00:00-05:00","2023-04-07 18:00:00-05:00","2023-04-07 19:00:00-05:00"]
}
df = pd.DataFrame(data).set_index("Date")
# Converting the index as date
df.index = pd.to_datetime(df.index)
df = make_dummies(df)
print(df)
这不会正确地合并数据。我为截图道歉,但函数只是在下面堆叠虚拟变量,我希望所有的虚拟变量都被添加到df中,而不是堆叠在下面。希望这是有意义的,我希望做一个函数,为每个小时,月份和日期类型创建*所有虚拟变量。
2条答案
按热度按时间pprl5pva1#
这里的sci-kit learn版本看起来有点令人生畏,但似乎也很有效:
6jygbczu2#
您刚刚错过了一些
set_index
来对齐pd.concat
上的索引:注:我认为
hours
和hour_dummies
是冗余的。输出: