pyspark Dataframe 分列错误,如何解决?

1cosmwyk  于 2022-11-01  发布在  Spark
关注(0)|答案(1)|浏览(179)
df4[['Title','FirstName','LastName']]=df3.USER_FULL_NAME.str.split(r'\D', expand=True)

我得到了以下错误。

TypeError                                 Traceback (most recent call last)
Cell In [121], line 2
      1 #df[['V','allele']] = df['V'].str.split('-',expand=True)
----> 2 df4[['Title','FirstName','LastName']]=df3.USER_FULL_NAME.str.split(r'\D', expand=True)

TypeError: 'Column' object is not callable
dtcbnfnu

dtcbnfnu1#

假设df3是这样的:

+---+--------------+
| ID|USER_FULL_NAME|
+---+--------------+
|  1|     Mr. AA BB|
|  2|    Mrs. BB CC|
|  3|     Dr. DD EE|
|  4|    PhD. FF GG|
+---+--------------+

可以使用以下代码拆分USER_FULL_NAME并创建一个新的DataFrame

df4 = df3.select(
        'Id',
        *[
            F.split(df3.USER_FULL_NAME, ' ').getItem(i).alias(col_name)
            for i, col_name in enumerate(('Title', 'FirstName', 'LastName'))
        ]
    )

在这种情况下,df4将为:

+---+-----+---------+--------+
| Id|Title|FirstName|LastName|
+---+-----+---------+--------+
|  1|  Mr.|       AA|      BB|
|  2| Mrs.|       BB|      CC|
|  3|  Dr.|       DD|      EE|
|  4| PhD.|       FF|      GG|
+---+-----+---------+--------+

相反,如果要将这三列添加到用户的Id已存在的DataFrame中,则可以将df4与其他DataFrame联接。假设输入DataFrame为df2

+---+--------+
| ID|Greeting|
+---+--------+
|  1|   Hello|
|  2|   Hallo|
|  4|    Ciao|
+---+--------+

然后,以下代码将连接两个DataFrame

df2.join(df4, 'Id', 'left')

从而导致:

+---+--------+-----+---------+--------+
| ID|Greeting|Title|FirstName|LastName|
+---+--------+-----+---------+--------+
|  1|   Hello|  Mr.|       AA|      BB|
|  2|   Hallo| Mrs.|       BB|      CC|
|  4|    Ciao| PhD.|       FF|      GG|
+---+--------+-----+---------+--------+

第三个选项可以是将三列添加到df3

split_column = F.split(F.col('USER_FULL_NAME'), ' ')
for i, col_name in enumerate(('Title', 'FirstName', 'LastName')):
    df3 = df3.withColumn(col_name, split_column.getItem(i))

在这种情况下,结果将是:

+---+--------------+-----+---------+--------+
| ID|USER_FULL_NAME|Title|FirstName|LastName|
+---+--------------+-----+---------+--------+
|  1|     Mr. AA BB|  Mr.|       AA|      BB|
|  2|    Mrs. BB CC| Mrs.|       BB|      CC|
|  3|     Dr. DD EE|  Dr.|       DD|      EE|
|  4|    PhD. FF GG| PhD.|       FF|      GG|
+---+--------------+-----+---------+--------+

相关问题