Apache Spark “您正在尝试合并对象和int64列”，但两者都是相同的类型

qrjkbowd 于 2023-10-23 发布在 Apache

关注(0)|答案(1)|浏览(106)

奇怪的问题，以前从来没有遇到过。让pandas frames out of spark清理特定的列，为它的值做好良好的连接准备。

def clean(data):
    # manipulate input
    return data

x = spark_f1.toPandas()
x["clean_name"] = x["Name"].apply(clean)
x["clean_name"] = x["clean_name"].astype(str)

# do the same for y frame

for c in x.columns:
    if c in y.columns:
        print(c)

> get "clean_name"

x["clean_name"].dtype == y["clean_name"].dtype

> get True

joined = x.join(y, on="clean_name", how="left", lsuffix="_x", rsuffix="_y")

> You are trying to merge on object and int64 columns.

我错过了什么吗？

apache-spark

来源：https://stackoverflow.com/questions/77025597/you-are-trying-to-merge-an-object-and-int64-columns-but-both-are-same-type

1条答案

按热度按时间

yhqotfr81#

在这个'for'循环之前，尝试添加以下代码。

# Convert column names to lowercase for case-insensitive matching
x.columns = x.columns.str.lower()
y.columns = y.columns.str.lower()

如果你想在x和y之间比较列名，建议在比较之前将列名转换为一种通用格式（例如，），以确保不区分大小写的匹配，因为在pandas中列名默认是区分大小写的。
同时，在“y.columns = y.columns.str.lower（）“之后添加这些代码，以确保相同的数据类型。

# Convert the "clean_name" column to a string data type in both DataFrames
x["clean_name"] = x["clean_name"].astype(str)
y["clean_name"] = y["clean_name"].astype(str)

赞(0）回复(0）举报 2023-10-23

我来回答

Apache Spark “您正在尝试合并对象和int64列”，但两者都是相同的类型

1条答案

相关问题

热门标签

最新问答