pandas 使用if语句循环遍历数组

iyfjxgzm  于 11个月前  发布在  其他
关注(0)|答案(3)|浏览(106)

我正在尝试编写代码,循环遍历以下代码中的列:四次,四个不同的数组:

median_alcohol = df.alcohol.median()
for i, alcohol in enumerate(df.alcohol):
    if alcohol >= median_alcohol:
        df.loc[i, 'alcohol'] = 'high'
    else:
        df.loc[i, 'alcohol'] = 'low'
df.groupby('alcohol').quality.mean()

字符串
该框架中的列为:

alcohol
pH
residual_sugar
citric_acid


我正在想办法捕捉这四个不同的数组。你有什么想法吗?

r3i60tvu

r3i60tvu1#

def numeric_to_buckets(df, column_name):
median = df[column_name].median()
for i, val in enumerate(df[column_name]):
    if val >= median:
        df.loc[i, column_name] = 'high'
    else:
        df.loc[i, column_name] = 'low' 
for feature in df.columns[:-1]:
numeric_to_buckets(df, feature)
print(df.groupby(feature).quality.mean(), '\n')

字符串

5cnsuln7

5cnsuln72#

我不知道你到底想做什么,但是,据我所知,你可以尝试这样的事情:

import pandas as pd 
from statistics import mean
df = pd.DataFrame({'alcohol':[45, 88, 56, 15, 71], 'pH':[12, 83, 56, 25,71],'residual_sugar':[14, 25, 55, 8, 21]}) 
print(df)
#Output
>>> alcohol  pH  residual_sugar
0      45    12   14
1      88    83   25
2      56    56   55
3      15    25    8
4      71    71   21

def func(colum):
    dftemp=df.copy()
    median_colum = eval('df.'+colum).median()
    for i, item in enumerate(eval('df.'+colum)):
        dftemp.loc[i, colum] = 'high' if item >= median_colum else 'low'
    return dftemp.groupby(colum).agg(list).applymap(mean)
    
    
differentarrays = [func(i) for i in df.columns]
for array in differentarrays:
    print(array)

字符串
输出量:

pH  residual_sugar
alcohol                      
high     70.0       33.666667
low      18.5       11.000000 

        alcohol  residual_sugar
pH                             
high  71.666667       33.666667
low   30.000000       11.000000 

                  alcohol    pH
residual_sugar                 
high            71.666667  70.0
low             30.000000  18.5

41zrol4v

41zrol4v3#

看起来你要找的是np.where,一个沿着

pd.DataFrame(np.where(df > df.median(), "high", "low"))

字符串

相关问题