pandas 基于2列对数据集进行排序,并基于2列的内容计算子数据集的平均值

pexxcrt2  于 2022-12-09  发布在  其他
关注(0)|答案(2)|浏览(118)

I have a data set that details polling data in different states and the percentage of people who have voted for either DEM or REP in that state. What my data frame looks like:
I'm essentially trying to find the average percentage of people in X state voting for either DEM or REP. So my output would be something like:
New Hampshire | DEM | 55% New Hampshire | REP | 45% Maine | DEM | 45% Maine | REP | 54% etc.
I initially thought of simply iterating over the entire dataset, and assigning new pct variables for each state's DEM percentage or REP percentage, but I felt that that is inefficient.
I'm thinking of sorting the data such that it has state1, DEM | state1, REP | state2, DEM | state3, REP etc. and then finding averages. But I am not too experienced with pandas (which is what I'm attempting to use). Perhaps someone can point me in the right direction.

njthzxwz

njthzxwz1#

IIUC,将pandas.concatGroupBy.mean一起使用:

cols = ["state", "party"]

(
    pd.concat([df_house, df_senate],
              ignore_index=True)
        .groupby(cols, as_index=False)
        .mean(numeric_only=True)
        .sort_values(by=cols)
)

这将返回一个(pandas.core.frame.DataFrame),您可以将它赋给一个变量:

df_average = pd.concat([df_house, df_senate], ignore_index=True).groupby(cols, as_index=False).mean(numeric_only=True).sort_values(by=cols)
4jb9z9bj

4jb9z9bj2#

尝试使用df.groupby(['state','party'])['pct'].mean()

相关问题