Pandas dataframe计算一个范围内的值,然后groupby

iswrvxsc  于 2023-06-20  发布在  其他
关注(0)|答案(3)|浏览(97)

我有一个这样的dataframe:
df =

date&time   measur_temp cal_temp  diff
0   2022-01-28  27      20            7
1   2022-01-29  27      23            4
2   2022-01-30  30      33            3
3   2022-01-31  33      32            1

我想创建一个新的表(dataframe也可以),其中包含diff值的范围,measur_temp的范围,如果它们落入这些bin中,则对它们进行计数:例如:在上述df的第一行中,diff为7并且位于范围7和10之间,并且温度位于范围21和30之间。把这当作一个事件。
最后的输出应该是这样的,有人能帮我吗?.

measur_Temp_range    diff<5  5=<diff<10  10=<diff<15  15=<diff<=20
0   10-20   0.0 0.0
1   21-30   1.1 0.0 
2   31-40   2.0 0.0 
3   41-50   0.0 0.0
thtygnil

thtygnil1#

对于这个问题,您可以使用np.select以一种简单的方式解决问题

df = pd.DataFrame({'name':['a','b','c','d','e','f','g'],
                   'val':[1,2,3,4,5,6,7]})

condition = [(df['val']<=1), (df['val']>1) & (df['val']<=3), df['val']>3]
choice = ['range 0-1', 'range 2-3', 'range 4-7']

df['new'] = np.select(condition, choice)

这给出了预期的结果:

name  val        new
0    a    1  range 0-1
1    b    2  range 2-3
2    c    3  range 2-3
3    d    4  range 4-7
4    e    5  range 4-7
5    f    6  range 4-7
6    g    7  range 4-7

如果有多个条件&|,请记住使用()
与使用crosstabpivot相比,这可能要长一点。但是,与crosstabpivot相比,它需要的数据准备较少

68de4m5k

68de4m5k2#

另一种可能的解决方案:

#https://stackoverflow.com/questions/76439436/#comment134784399_76439436
out = (
    df.pivot_table(
        index=pd.cut(df["measur_temp"], range(10, 60, 10)),
        columns=pd.cut(df["diff"], range(0, 25, 5), right=False),
        values="date&time", aggfunc="count")
)

out.index = [
    f"{x.left}-{x.right}" if i==0 else f"{x.left+1}-{x.right}"
    for i,x in enumerate(out.index)
]

out.columns = [
    f"diff<{x.right}" if i==0 else f"{x.left}=<diff<{x.right}"
    for i,x in enumerate(out.columns)
]

out = out.reset_index(names="measur_Temp_range")

输出:

print(out)

  measur_Temp_range  diff<5  5=<diff<10  10=<diff<15  15=<diff<20
0             10-20       0           0            0            0
1             21-30       2           1            0            0
2             31-40       1           0            0            0
3             41-50       0           0            0            0
wb1gzix0

wb1gzix03#

crosstabcut合并列一起使用:

df = pd.crosstab(pd.cut(df['measur_temp'],bins=[10,20,30,40,50]),
                  pd.cut(df['diff'],bins=[-np.inf,5,10,15,20], right=False),
                  dropna=False)
print (df)
diff         [-inf, 5.0)  [5.0, 10.0)  [10.0, 15.0)  [15.0, 20.0)
measur_temp                                                      
(10, 20]               0            1             1             0
(20, 30]               1            0             0             0
(30, 40]               1            0             0             0
(40, 50]               0            0             0             0

如果需要,还可以使用以下方法对箱的动态标签进行计数:

b1 = range(10,60,10)
lab1=[f'{a+1}-{b}' if i!=0 else f'{a}-{b}' for i,(a,b) in enumerate(zip(b1, b1[1:]))]

b2 = list(range(0,25,5))
#first value less like 5
b2[0] = -np.inf

lab2=[f'{a}=<diff<{b}' if i!=0 else f'diff<{b}' for i,(a,b) in enumerate(zip(b2, b2[1:]))]
df = pd.crosstab(pd.cut(df['measur_temp'],bins=b1, labels=lab1),
                  pd.cut(df['diff'],bins=b2, labels=lab2, right=False),
                  dropna=False)

print (df)

diff         diff<5  5=<diff<10  10=<diff<15  15=<diff<20
measur_temp                                              
10-20             0           0            0            0
21-30             2           1            0            0
31-40             1           0            0            0
41-50             0           0            0            0

相关问题