pandas 如何将date(YYYY-MM-DD)转换为Month-YY,并在其他列上进行groupby以获取最小和最大月份?

xqkwcwgp  于 2022-12-02  发布在  其他
关注(0)|答案(2)|浏览(111)

I have created a data frame which has rolling quarter mapping using the code

abcd = pd.DataFrame()
abcd['Month'] = np.nan

abcd['Month'] = pd.date_range(start='2020-04-01', end='2022-04-01', freq = 'MS')

abcd['Time_1'] = np.arange(1, abcd.shape[0]+1)
abcd['Time_2'] = np.arange(0, abcd.shape[0])
abcd['Time_3'] = np.arange(-1, abcd.shape[0]-1)

db_nd_ad_unpivot = pd.melt(abcd, id_vars=['Month'], 
                     value_vars=['Time_1', 'Time_2', 'Time_3',], 
                     var_name='Time_name', value_name='Time')
abcd_map = db_nd_ad_unpivot[(db_nd_ad_unpivot['Time']>0)&(db_nd_ad_unpivot['Time']< abcd.shape[0]+1)]
abcd_map = abcd_map[['Month','Time']]

The output of the code looks like this:

Now, I have created an additional column name that gives me the name of the month and year in format Mon-YY using the code

abcd_map['Month'] = pd.to_datetime(abcd_map.Month)
# abcd_map['Month'] = abcd_map['Month'].astype(str)
abcd_map['Time_Period'] = abcd_map['Month'].apply(lambda x: x.strftime("%b'%y"))

Now I want to see for a specific time, what is the minimum and maximum in the month column. For eg. for time instance 17

,The simple groupby results as: Time Period 17 Aug'21-Sept'21

The desired output is Time Time_Period 17 Aug'21-Oct'21.
I think it is based on min and max of the column Month as by using the strftime function the column is getting converted in String/object type.

r7knjye2

r7knjye21#

求出最小值和最大值后转换成字符串怎么样

New_df = abcd_map.groupby('Time')['Month'].agg(['min', 'max']).apply(lambda x: x.dt.strftime("%b'%y")).agg(' '.join, axis=1).reset_index()
gkn4icbw

gkn4icbw2#

请执行以下操作:

abcd_map['Month_'] = pd.to_datetime(abcd_map['Month']).dt.strftime('%Y-%m')
abcd_map['Time_Period'] = abcd_map['Month_'] = pd.to_datetime(abcd_map['Month']).dt.strftime('%Y-%m')
abcd_map['Time_Period'] = abcd_map['Month'].apply(lambda x: x.strftime("%b'%y"))
df = abcd_map.groupby(['Time']).agg(
    sum_col=('Time', np.sum),
    first_date=('Time_Period', np.min),
    last_date=('Time_Period', np.max)
).reset_index()

df['TimePeriod'] = df['first_date']+'-'+df['last_date']
df = df.drop(['first_date','last_date'], axis = 1)
df

其返回

Time  sum_col     TimePeriod
0      1        3  Apr'20-May'20
1      2        6  Jul'20-May'20
2      3        9  Aug'20-Jun'20
3      4       12  Aug'20-Sep'20
4      5       15  Aug'20-Sep'20
5      6       18  Nov'20-Sep'20
6      7       21  Dec'20-Oct'20
7      8       24  Dec'20-Nov'20
8      9       27  Dec'20-Jan'21
9     10       30  Feb'21-Mar'21
10    11       33  Apr'21-Mar'21
11    12       36  Apr'21-May'21
12    13       39  Apr'21-May'21
13    14       42  Jul'21-May'21
14    15       45  Aug'21-Jun'21
15    16       48  Aug'21-Sep'21
16    17       51  Aug'21-Sep'21
17    18       54  Nov'21-Sep'21
18    19       57  Dec'21-Oct'21
19    20       60  Dec'21-Nov'21
20    21       63  Dec'21-Jan'22
21    22       66  Feb'22-Mar'22
22    23       69  Apr'22-Mar'22
23    24       48  Apr'22-Mar'22
24    25       25  Apr'22-Apr'22

相关问题