pandas-从多级列和聚合创建新系列

kulphzqa  于 2023-05-15  发布在  其他
关注(0)|答案(1)|浏览(133)

所以我有一些复杂的分析要做。我在df中有这个数据:

Date/Time   TimeStamp   CallerId    CallType    watts   Band    slot    Channel
0   20:02.0 3113677432  17794800    C1  0.060303    12  1   2
1   20:02.0 3113677432  5520488 OP8 0.302229    12  1   1
2   20:02.0 3113677432  5520488 OP8 0.302229    13  1   1
3   20:02.0 3113677432  5520488 OP8 0.302229    12  2   1
4   20:02.0 3113677432  5520488 OP8 0.302229    13  2   1
5   20:02.0 3113677432  5520488 OP8 0.302229    12  3   1
6   20:02.0 3113677432  5520488 OP8 0.302229    13  3   1
7   20:02.0 3113677432  5520488 OP8 0.302229    12  4   1
8   20:02.0 3113677432  5520488 OP8 0.302229    13  4   1
9   20:07.0 3113677488  17794800    C1  0.151473    12  1   2
10  20:07.0 3113677488  5218651 CC8kds  0.475604    13  4   1
11  20:07.0 3113677488  5514318 BD  1.906933    12  1   6
12  20:11.0 3113677532  17794800    C1  0.038048    12  1   2
13  20:11.0 3113677532  5218651 CC8kds  0.300086    13  4   1
14  20:11.0 3113677532  5501460 PTN3    4.790000    12  1   5
15  21:51.0 3113678643  9895585 CC8kds  0.075378    12  1   1
16  21:51.0 3113678643  5482185 OP8 0.302229    13  1   1
17  21:51.0 3113678643  5482185 OP8 0.302229    13  2   1
18  21:51.0 3113678643  5482185 OP8 0.302229    13  3   1
19  21:51.0 3113678643  5482185 OP8 0.302229    13  4   1
20  21:51.0 3113678643  5513470 PTN3    4.790000    12  3   1
21  21:51.0 3113678643  5518399 PTN3    4.790000    12  3   5

时间戳在许多项上重复,因为每行捕获频带(10-20)和该频带中的通道(1-7)。该时隙进一步将信道划分为(1-4)个时隙。我使用一个pivot_table来对每个(Band,Channel,slot)中的瓦特(功率)进行分组,如下所示:

df = df.pivot_table(index='TimeStamp', columns=['Band','Channel','slot'], values='watts', 
                                       aggfunc=sum, fill_value=0)

x1c 0d1x我想做的是创建一个新的系列,freq = 1200+(Band-1)+Channel/12,然后将所有4个插槽的功率相加...例如:
频带12,信道1@TimeStamp 3113677432导致总pwr为0.302229+0.302229+0.302229+0.302229 = 1.208916,频率值为1200+(12-1)+1/12=1211.083……所以我会有这样的东西

TimeStamp   Freq    Pwr ...
3113677432  1211.083    1.208916 ... # This is the total for Band 12, Channel 1 @ this timestamp
3113677432  1211.640    2.208916 ... # this isn't a real total

我需要在每个时间戳计算的每个频率。最好将这些频率作为它们自己的(多级)列标题(我在这里编造数字):

Freq        1211.083    1211.64     1212.04...
TimeStamp 
3113677432  1.208916    2.208916    2.208916...
3113677488  2.058406    0.475604    2.208916...

我可以像这样在一个col头上分组:

df.groupby(level=0, axis=1).sum() # Groups by Band

df.groupby(level=1, axis=1).sum() # Groups by Channel - but across ALL Bands - WRONG

那么,是否有一种直接的方法来按频段和通道进行分组,并对所有4个插槽进行求和?

ee7vknir

ee7vknir1#

不需要透视表,只需对多个列执行groupby操作,就可以开始了!

# loading data using solution from https://stackoverflow.com/a/53692642/8718701
from io import StringIO
import pandas as pd

d = '''
    Date/Time   TimeStamp   CallerId    CallType    watts   Band    slot    Channel
0   20:02.0 3113677432  17794800    C1  0.060303    12  1   2
1   20:02.0 3113677432  5520488 OP8 0.302229    12  1   1
2   20:02.0 3113677432  5520488 OP8 0.302229    13  1   1
3   20:02.0 3113677432  5520488 OP8 0.302229    12  2   1
4   20:02.0 3113677432  5520488 OP8 0.302229    13  2   1
5   20:02.0 3113677432  5520488 OP8 0.302229    12  3   1
6   20:02.0 3113677432  5520488 OP8 0.302229    13  3   1
7   20:02.0 3113677432  5520488 OP8 0.302229    12  4   1
8   20:02.0 3113677432  5520488 OP8 0.302229    13  4   1
9   20:07.0 3113677488  17794800    C1  0.151473    12  1   2
10  20:07.0 3113677488  5218651 CC8kds  0.475604    13  4   1
11  20:07.0 3113677488  5514318 BD  1.906933    12  1   6
12  20:11.0 3113677532  17794800    C1  0.038048    12  1   2
13  20:11.0 3113677532  5218651 CC8kds  0.300086    13  4   1
14  20:11.0 3113677532  5501460 PTN3    4.790000    12  1   5
15  21:51.0 3113678643  9895585 CC8kds  0.075378    12  1   1
16  21:51.0 3113678643  5482185 OP8 0.302229    13  1   1
17  21:51.0 3113678643  5482185 OP8 0.302229    13  2   1
18  21:51.0 3113678643  5482185 OP8 0.302229    13  3   1
19  21:51.0 3113678643  5482185 OP8 0.302229    13  4   1
20  21:51.0 3113678643  5513470 PTN3    4.790000    12  3   1
21  21:51.0 3113678643  5518399 PTN3    4.790000    12  3   5
'''

df = pd.read_csv(StringIO(d), sep='\s+')

# calculating total pwr -> sum(watts)

agg_df = (
    df
    .groupby(['TimeStamp', 'Band', 'Channel'], as_index=False)
    .agg({'watts': 'sum'})
)

# freq = 1200+(Band-1)+Channel/12
agg_df['freq'] = (
    1200
    + agg_df['Band']-1
    + agg_df['Channel']/12
)

print(agg_df.to_markdown(index=False, floatfmt=''))

结果:

|    TimeStamp |   Band |   Channel |    watts |               freq |
|-------------:|-------:|----------:|---------:|-------------------:|
| 3113677432.0 |   12.0 |       1.0 | 1.208916 | 1211.0833333333333 |
| 3113677432.0 |   12.0 |       2.0 | 0.060303 | 1211.1666666666667 |
| 3113677432.0 |   13.0 |       1.0 | 1.208916 | 1212.0833333333333 |
| 3113677488.0 |   12.0 |       2.0 | 0.151473 | 1211.1666666666667 |
| 3113677488.0 |   12.0 |       6.0 | 1.906933 | 1211.5             |
| 3113677488.0 |   13.0 |       1.0 | 0.475604 | 1212.0833333333333 |
| 3113677532.0 |   12.0 |       2.0 | 0.038048 | 1211.1666666666667 |
| 3113677532.0 |   12.0 |       5.0 | 4.79     | 1211.4166666666667 |
| 3113677532.0 |   13.0 |       1.0 | 0.300086 | 1212.0833333333333 |
| 3113678643.0 |   12.0 |       1.0 | 4.865378 | 1211.0833333333333 |
| 3113678643.0 |   12.0 |       5.0 | 4.79     | 1211.4166666666667 |
| 3113678643.0 |   13.0 |       1.0 | 1.208916 | 1212.0833333333333 |

相关问题