pandas 基于2列对数据集进行排序，并基于2列的内容计算子数据集的平均值

pexxcrt2 于 2022-12-09 发布在其他

关注(0)|答案(2)|浏览(117)

I have a data set that details polling data in different states and the percentage of people who have voted for either DEM or REP in that state. What my data frame looks like:
I'm essentially trying to find the average percentage of people in X state voting for either DEM or REP. So my output would be something like:
New Hampshire | DEM | 55% New Hampshire | REP | 45% Maine | DEM | 45% Maine | REP | 54% etc.
I initially thought of simply iterating over the entire dataset, and assigning new pct variables for each state's DEM percentage or REP percentage, but I felt that that is inefficient.
I'm thinking of sorting the data such that it has state1, DEM | state1, REP | state2, DEM | state3, REP etc. and then finding averages. But I am not too experienced with pandas (which is what I'm attempting to use). Perhaps someone can point me in the right direction.

pandas

来源：https://stackoverflow.com/questions/74691892/sorting-a-dataset-based-on-2-columns-computing-averages-of-sub-datasets-based

2条答案

按热度按时间

njthzxwz1#

IIUC，将pandas.concat与GroupBy.mean一起使用：

cols = ["state", "party"]

(
    pd.concat([df_house, df_senate],
              ignore_index=True)
        .groupby(cols, as_index=False)
        .mean(numeric_only=True)
        .sort_values(by=cols)
)

这将返回一个（pandas.core.frame.DataFrame），您可以将它赋给一个变量：

df_average = pd.concat([df_house, df_senate], ignore_index=True).groupby(cols, as_index=False).mean(numeric_only=True).sort_values(by=cols)

赞(0）回复(0）举报 2022-12-09

4jb9z9bj2#

尝试使用df.groupby(['state','party'])['pct'].mean()

赞(0）回复(0）举报 2022-12-09

我来回答

pandas 基于2列对数据集进行排序，并基于2列的内容计算子数据集的平均值

2条答案

相关问题

热门标签

最新问答