如何比较忽略列顺序的pandas Dataframe

puruo6ea 于 2023-08-01 发布在其他

关注(0)|答案(3)|浏览(104)

假设我有两个pandas dataframe：

df_a = pd.DataFrame({ 0: ['a', 'b', 'c'], 1: [9, 8, 7], 2: [True, True, False] })
df_b = pd.DataFrame({ 0: [9, 8, 7], 1: [True, True, False], 2: ['a', 'b', 'c'] })

字符串
如果忽略列顺序，这两个应该相等，因为它们各自包含具有相同行顺序的相同3列。我见过的每一种解决方案都试图基于列名进行匹配，但这对我来说并不重要。

pandas

来源：https://stackoverflow.com/questions/76775116/how-to-compare-pandas-dataframes-ignoring-column-order

3条答案

按热度按时间

bsxbgnwa1#

在你的情况下

df_a.apply(set,1).eq(df_b.apply(set,1))
Out[32]: 
0    True
1    True
2    True
dtype: bool

字符串

赞(0）回复(0）举报 2023-08-01

8ehkhllq2#

如果我理解正确的话，这个问题归结为 Dataframe 是否具有相同的列，不管列的顺序如何，我们检查的是 Dataframe 的相等性，而不是单个值。所以解决办法是：

def check_if_frames_are_equal(df_a, df_b) -> bool :
    """
    Checks if two dataframes are equal (have the same columns),
    regardless of the order of the columns.

    Parameters
    ----------
    df_a : pandas.DataFrame.
        First dataframe.
    df_b : pandas.DataFrame.
        Second dataframe.

    Returns
    -------
    bool.
        True if the dataframes are equal, False otherwise.

    Example
    -------
    df_a = pd.DataFrame({ 0: ['a', 'b', 'c'], 1: [9, 8, 7], 2: [True, True, 
    False] })
    df_b = pd.DataFrame({ 0: [9, 8, 7], 1: [True, True, False], 2: ['a', 
    'b', 'c'] })
    df_c = pd.DataFrame({ 0: [9, 8, 7], 1: [True, True, False], 2: ['a', 
    'd', 'c'] })

    check_if_frames_are_equal(df_a, df_b)
        >>> True

    check_if_frames_are_equal(df_a, df_c)
        >>> False

    """

    return {tuple(df_a[c]) for c in df_a.columns} == {tuple(df_b[c]) for c 
in df_b.columns}

字符串

赞(0）回复(0）举报 2023-08-01

gkl3eglg3#

另一种可能的解决方案，基于numpy：

(df_b.T.values == df_a.T.values[:,None]).all(axis=2).any(axis=1).all()

字符串

赞(0）回复(0）举报 2023-08-01

我来回答

如何比较忽略列顺序的pandas Dataframe

3条答案

相关问题

热门标签

最新问答