我有一个Dataframe,我想进一步处理特定列的值。如何在我的代码pyspark中获取值
for i in range(0,df.count()): df_year = df['year'][i] print(df_year)
for i in range(0,df.count()):
df_year = df['year'][i]
print(df_year)
我得到这样的输出
Column<b'year'>Column<b'year'>
Column<b'year'>
这是我的预期产出
20152016
2015
2016
fdbelqdn1#
for row in df.rdd.collect(): print(row['year'])
for row in df.rdd.collect():
print(row['year'])
fivyi3re2#
如果你只想要年份栏,
for row in df.select("year").rdd.collect(): print(row['year'])
for row in df.select("year").rdd.collect():
rkkpypqq3#
你可以试试这个-
>>> from pyspark import SparkContext>>> from pyspark.sql import SQLContext>>> sc = SparkContext.getOrCreate()>>> sql = SQLContext(sc)>>> df = sql.createDataFrame([(2015, 4), (2016, 5),(2017,6),(2018,7)], ["Year", "Month"])>>> df.show()+----+-----+|Year|Month|+----+-----+|2015| 4||2016| 5||2017| 6||2018| 7|+----+-----+>>> [x.Year for x in df.select("Year").collect()][2015, 2016, 2017, 2018]
>>> from pyspark import SparkContext
>>> from pyspark.sql import SQLContext
>>> sc = SparkContext.getOrCreate()
>>> sql = SQLContext(sc)
>>> df = sql.createDataFrame([(2015, 4), (2016, 5),(2017,6),(2018,7)], ["Year", "Month"])
>>> df.show()
+----+-----+
|Year|Month|
|2015| 4|
|2016| 5|
|2017| 6|
|2018| 7|
>>> [x.Year for x in df.select("Year").collect()]
[2015, 2016, 2017, 2018]
oxosxuxt4#
for i in range(0,df.count()): df_year=df.collect()[i][1] print(df_year)
df_year=df.collect()[i][1]
其中1是从零开始的列索引。
4条答案
按热度按时间fdbelqdn1#
fivyi3re2#
如果你只想要年份栏,
rkkpypqq3#
你可以试试这个-
oxosxuxt4#
其中1是从零开始的列索引。