pandas 如何从多个 Dataframe 创建频率/值计数表

snvhrwxg 于 2022-12-16 发布在其他

关注(0)|答案(3)|浏览(128)

我有两个 Dataframe ，

df1                       df2
country                   country
US                        AR
US                        AD
CA                        AO
CN                        AU
AR                        US

如何通过将国家/地区列表合并为一个集合来比较两个 Dataframe 之间的差异，从而按它们进行分组？
我的预期结果是，

country code   df1_country_count   df2_country_count
AR                   1                    1
AD                   0                    1
AO                   0                    1
AU                   0                    1
US                   2                    1 
CA                   1                    0
CN                   1                    0

pandas

来源：https://stackoverflow.com/questions/74791868/how-to-create-a-frequency-value-count-table-from-multiple-dataframes

3条答案

按热度按时间

oipij1gg1#

(df1.value_counts().to_frame('df1_country_count')
 .join(df2.value_counts().to_frame('df2_country_count'), how='outer')
 .fillna(0).astype('int').rename_axis('country code'))

结果：

df1_country_count    df2_country_count
country code        
AD             0                    1
AO             0                    1
AR             1                    1
AU             0                    1
CA             1                    0
CN             1                    0
US             2                    1

赞(0）回复(0）举报 2022-12-16

jslywgbw2#

您可以使用value_counts，然后使用concat。

out = pd.concat([df1.country.value_counts(), 
           df2.country.value_counts()], axis=1).fillna(0).astype(int)
out.columns = ['df1_country', 'df2_country']
print(out)

    df1_country  df2_country
US            2            1
CA            1            0
CN            1            0
AR            1            1
AD            0            1
AO            0            1
AU            0            1

赞(0）回复(0）举报 2022-12-16

dxxyhpgq3#

使用pd.concat合并所有 Dataframe （不管有多少），并在列表解析中使用.assign添加'source'列。
source=f'df{i}'：构造f字符串，以确定列名在频率表中的显示方式。
如果从文件加载数据，请参见此answer的选项4，将csv文件直接加载到具有新列的单个 Dataframe 中。
使用pd.crosstab计算这两列的频数表。

import pandas as pd

# sample dataframes
df1 = pd.DataFrame({'country': ['US', 'US', 'CA', 'CN', 'AR']})
df2 = pd.DataFrame({'country': ['AR', 'AD', 'AO', 'AU', 'US']})

# list of dataframes
df_list = [df1, df2]

# combine dataframes
df = pd.concat([d.assign(source=f'df{i}') for i, d in enumerate(df_list, 1)], ignore_index=True)

# create frequency table
counts = pd.crosstab(df.country, df.source)

source   df1  df2
country          
AD         0    1
AO         0    1
AR         1    1
AU         0    1
CA         1    0
CN         1    0
US         2    1

赞(0）回复(0）举报 2022-12-16

我来回答

pandas 如何从多个 Dataframe 创建频率/值计数表

3条答案

相关问题

热门标签

最新问答