本机比较两个 Dataframe

arknldoa  于 2021-08-20  发布在  Java
关注(0)|答案(2)|浏览(423)

我想比较两个非常相似的 Dataframe ,一个从json文件加载并重新采样,第二个从csv文件加载,来自一些更复杂的用例。
这些是我们的第一个价值观 df1 :

page
logging_time                   
2021-07-04 18:14:47.000   748.0
2021-07-04 18:14:47.100     0.0
2021-07-04 18:14:47.200     0.0
2021-07-04 18:14:47.300     3.0
2021-07-04 18:14:47.400     4.0
[5 rows x 1 columns]

这些是第二个值 df2 :

@timestamp per 100 milliseconds  Sum of page
0          2021-04-07 18:14:47.000        748.0
1          2021-04-07 18:14:47.100          0.0
2          2021-04-07 18:14:47.200          0.0
3          2021-04-07 18:14:47.300          3.0
4          2021-04-07 18:14:47.400          4.0
[5 rows x 2 columns]

我在拿它们和 pandas.testing.assert_frame_equal ,尝试对数据进行一些自定义以使其相等,希望获得相关帮助。应删除第一列,并忽略标签名称。
我想以最自然的方式做这件事,而不是只比较价值观。
任何帮助都将不胜感激

t3psigkw

t3psigkw1#

你可以使用 equals 函数来比较 Dataframe 。问题是列名必须匹配:

data = [                
    ["2021-07-04 18:14:47.000", 748.0],
    ["2021-07-04 18:14:47.100",   0.0],
    ["2021-07-04 18:14:47.200",   0.0],
    ["2021-07-04 18:14:47.300",   3.0],
    ["2021-07-04 18:14:47.400",   4.0],
]

df1 = pd.DataFrame(data, columns = ["logging_time", "page"])
df1.set_index("logging_time", inplace=True)

df2 = pd.DataFrame(data1, columns = ["logging_time", "page"])
df2.columns = df2.columns

print(df1.reset_index().equals(df2))

输出: True

zpgglvta

zpgglvta2#

from pandas.testing import assert_frame_equal

我使用的 Dataframe :

df1=pd.DataFrame({'page': {'2021-07-04 18:14:47.000': 748.0,
  '2021-07-04 18:14:47.100': 0.0,
  '2021-07-04 18:14:47.200': 0.0,
  '2021-07-04 18:14:47.300': 3.0,
  '2021-07-04 18:14:47.400': 4.0}})
df1.index.names=['logging_time']

df2=pd.DataFrame({'@timestamp per 100 milliseconds': {0: '2021-07-04 18:14:47.000',
  1: '2021-07-04 18:14:47.100',
  2: '2021-07-04 18:14:47.200',
  3: '2021-07-04 18:14:47.300',
  4: '2021-07-04 18:14:47.400'},
 'Sum of page': {0: 748.0, 1: 0.0, 2: 0.0, 3: 3.0, 4: 4.0}})

解决方案:

df1=df1.reset_index()

# reseting the index of df1

df2.columns=df1.columns

# renaming the columns of df2 so that they become same as df1

print((df1.dtypes==df2.dtypes).all())

# If the above code return True it means they are same

# If It return False then check the output of:print(df1.dtypes==df2.dtypes)

# and change the dtypes of any one df(either df1 or df2) accordingly

# Finally:

print(assert_frame_equal(df1,df2))

# The above code prints None then It means they are equal

# otherwise it will throw AssertionError

相关问题