withcolumn by udf之后,run count()给出typeerror:“nonetype”对象不可下标

9lowa7mx  于 2021-07-09  发布在  Spark
关注(0)|答案(1)|浏览(383)

我使用withcolumn with udf获得一个新列,然后选择两个列并将其分配给一个新的df。但是当我在这个新的df上运行count()时,它给了我typeerror:'nonetype'对象是不可订阅的。show()工作正常。我想知道新df的长度。这是我的密码:


# Find all entities with names that are palindromes

# (name reads the same way forward and reverse, e.g. madam):

# print the count and show() the resulting Spark DataFrame

from pyspark.sql.types import BooleanType

def is_palindrome(entity_name):
    return entity_name == entity_name[::-1]
spark_udf = udf(is_palindrome, BooleanType())
palindrome_df = cb_sdf.withColumn('is_palindrome', spark_udf('name'))
palindrome_df = palindrome_df.where(palindrome_df['is_palindrome']).select('name', 'is_palindrome')
print(palindrome_df.show())
print(palindrome_df.count())

下面是我得到的输出和错误消息:

+------+-------------+
|  name|is_palindrome|
+------+-------------+
| KAYAK|         true|
| ooVoo|         true|
| 63336|         true|
| TipiT|         true|
| beweb|         true|
|   CSC|         true|
|   CBC|         true|
|   OQO|         true|
|   SAS|         true|
|   e4e|         true|
|   PHP|         true|
|   ivi|         true|
|  ADDA|         true|
|izeezi|         true|
| siXis|         true|
| STATS|         true|
|   8x8|         true|
|   IXI|         true|
|   GLG|         true|
|   2e2|         true|
+------+-------------+
only showing top 20 rows

None
---------------------------------------------------------------------------
PythonException                           Traceback (most recent call last)
<ipython-input-24-7fd424328e85> in <module>()
     10 palindrome_df = palindrome_df.where(palindrome_df['is_palindrome']).select('name', 'is_palindrome')
     11 print(palindrome_df.show())
---> 12 print(palindrome_df.count())

2 frames
/usr/local/lib/python3.7/dist-packages/pyspark/sql/utils.py in deco(*a,**kw)
    115                 # Hide where the exception came from that shows a non-Pythonic
    116                 # JVM exception message.
--> 117                 raise converted from None
    118             else:
    119                 raise

PythonException: 
  An exception was thrown from the Python worker. Please see the stack trace below.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", line 604, in main
    process()
  File "/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", line 596, in process
    serializer.dump_stream(out_iter, outfile)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/pyspark.zip/pyspark/serializers.py", line 211, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/pyspark.zip/pyspark/serializers.py", line 132, in dump_stream
    for obj in iterator:
  File "/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/pyspark.zip/pyspark/serializers.py", line 200, in _batched
    for item in iterator:
  File "/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", line 450, in mapper
    result = tuple(f(*[a[o] for o in arg_offsets]) for (arg_offsets, f) in udfs)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", line 450, in <genexpr>
    result = tuple(f(*[a[o] for o in arg_offsets]) for (arg_offsets, f) in udfs)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", line 85, in <lambda>
    return lambda *a: f(*a)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/pyspark.zip/pyspark/util.py", line 73, in wrapper
    return f(*args,**kwargs)
  File "<ipython-input-24-7fd424328e85>", line 7, in is_palindrome
TypeError: 'NoneType' object is not subscriptable

提前谢谢!

nuypyhwy

nuypyhwy1#

可能在Dataframe的某个地方有空值,但在显示的前20行中没有。这就是为什么在计算整个Dataframe时出现错误,而在显示Dataframe中的20行时却没有。
要防止空值使程序崩溃,请将udf更改为:

def is_palindrome(entity_name):
    if entity_name is None:
        return None
    else:
        return entity_name == entity_name[::-1]

相关问题