字符串列上的Pandas连接失败：ValueError：您正在尝试合并object和int64列

j2datikz 于 12个月前发布在其他

关注(0)|答案(2)|浏览(125)

我在两个嵌套框架上尝试一个非常简单的连接：DF1和DF2。我从一个csv文件中读取了它们，指定了连接列的dtype：

df1=df1.read_csv("df1.csv",dtype={"code":str}
df2=df2.read_csv("df2.csv",dtype={"code":str}

内容类型如下：

df1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   local        6 non-null      object
 1   name         6 non-null      object
 2   type         6 non-null      int64 
 3   second_name  6 non-null      object
 4   code         6 non-null      object
 5   item_name    6 non-null      object

df2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 14 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   item_id        1 non-null      int64  
 1   item_name2     1 non-null      object 
 2   code           1 non-null      object 
 3   size           1 non-null      float64
 4   category_id    1 non-null      object 
 5   quality        1 non-null      object 
 6   quality_id     1 non-null      int64  
 7   brand          1 non-null      object 
 8   brand_subtype  1 non-null      object 
 9   score          1 non-null      int64  
 10  size.1         1 non-null      object 
 11  country        1 non-null      object 
 12  city           1 non-null      object 
 13  level          1 non-null      int64  
dtypes: float64(1), int64(4), object(9)
memory usage: 240.0+ bytes

实际内容：

df1
  local name  type second_name code item_name
0   yes  bob     1       jenga    1    triple
1   yes  bob     1       jenga    1    triple
2   yes  bob     1       jenga    1    triple
3   yes  bob     1       jenga    1    triple
4   yes  bob     1       jenga    1    triple
5   yes  bob     1       jenga    1    triple

df2
   item_id item_name2 code  size  ... size.1 country      city level
0     4500     triple    1  0.25  ...  small   china  shanghai     3

只是为了确保关键列“code”的数据类型，我显式地将该列转换为字符串：

df1.code = df1.code.astype(str)
df2.code = df2.code.astype(str)

问题是当我尝试加入时，（左或右）：

df1.join(df2, how='left', on='code')

我得到以下错误：ValueError：您正在尝试合并object和int64列
由于我已经将列代码显式地读取为字符串类型，并且稍后也对它们进行了强制转换（请放心，如果不重复强制转换，我会遇到同样的问题），因此我不认为这是一个问题。
我可以使用pd.merge代替，但它不能解释或解决问题。
使用Python 3.10
有什么想法吗？

pandas

来源：https://stackoverflow.com/questions/77224008/pandas-join-on-string-columns-failing-valueerror-you-are-trying-to-merge-on-ob

2条答案

按热度按时间

vh0rcniy1#

merge适用于字符串but join不适用于字符串。
试试看：

df1.merge(df2, how='left', on='code')

df1.join(df2)总是通过df2的索引合并，而df1.merge(df2)将在列上合并。

编辑：

找到了这个答案中解释的原因：
What is the difference between join and merge in Pandas?

赞(0）回复(0）举报 12个月前

nzk0hqpo2#

我能想到两个原因：
1.数据集中NUll值的存在
1.带空格的字符
试试这个：

df1.dropna(inplace=True)
df2.dropna(inplace=True)
# Remove the white spaces from the code feature
df1['code'] = df1['code'].str.strip()
df2['code'] = df2['code'].str.strip()
# now merge
merged_df = df1.merge(df2.astype({'code': 'str'}), how='left', on='code')

希望能帮上忙！

赞(0）回复(0）举报 12个月前

我来回答

字符串列上的Pandas连接失败：ValueError：您正在尝试合并object和int64列

2条答案

相关问题

热门标签

最新问答