我试图在我的Databricks笔记本中将列表为[599086.9706961295, 4503107.843920314]
的列分成两列(“x”和“y”)。
在我的Jupyter笔记本中,列是这样分开的:
# code from my jupter notebook
# column with list in it is: xy
# Method 1
complete[['x', 'y']] = pd.Series(np.stack(complete['xy'].values).T.tolist())
# column is also getting separated using this method
# Method 2
def sepXY(xy):
return xy[0],xy[1]
complete['x'],complete['y'] = zip(*complete['xy'].apply(sepXY))
在我的Databricks记事本中,出现错误:
两种方法我都试过了
import pyspark.pandas as ps
# Method 1
complete[['x', 'y']] = ps.Series(np.stack(complete['xy'].values).T.tolist())
判断提示错误:
如果我只运行ps.Series(np.stack(complete['xy'].values).T.tolist())
,我将得到包含x和y的两个列表的输出
0 [599086.9706961295, 599079.1456765212, 599059....
1 [4503107.843920314, 4503083.465809557, 4503024...
但是当我把它赋值给complete[['x','y']]
时,它抛出了错误。
# Method 2
def sepXY(xy):
return xy[0],xy[1]
complete['x'],complete['y'] = zip(*complete['xy'].apply(sepXY))
箭头无效:无法使用类型元组转换(599086.9706961295,4503107.843920314):在推断Arrow数据类型时无法识别Python值类型
我检查了数据类型,它不是元组
我也试过
complete[['x','y']] = pd.DataFrame(complete.xy.tolist(), index= complete.index)
如果我使用这个,我的核心会重新启动
# This is the column for sample
xy
[599086.9706961295, 4503107.843920314]
[599088.5389507986, 4503112.7796745915]
[599072.8088083105, 4503064.139248001]
[599090.0996424126, 4503117.721156018]
[599074.3909188313, 4503068.925677084]
1条答案
按热度按时间np8igboo1#
输入:
在上面的例子中,可以这样做: