pyspark SparkRuntimeException:[UDF_USER_CODE_ERROR.GENERIC]函数执行失败

7jmck4yq  于 2024-01-06  发布在  Spark
关注(0)|答案(1)|浏览(210)

当我尝试显示由training_set创建的训练框架时,收到以下错误。

  1. SparkRuntimeException: [UDF_USER_CODE_ERROR.GENERIC] Execution of function mycatalog.mydatabase.product_difference_ratio_on_demand_feature(left_MaxProductAmount#6091, left_Amount#6087) failed.
  2. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 217.0 failed 4 times, most recent failure: Lost task 0.3 in stage 217.0 (TID 823) (ip-10-0-32-203.us-west-2.compute.internal executor driver): org.apache.spark.SparkRuntimeException: [UDF_USER_CODE_ERROR.GENERIC] Execution of function mycatalog.mydatabase.product_difference_ratio_on_demand_feature(left_MaxProductAmount#6091, left_Amount#6087) failed.
  3. == Error ==
  4. TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'
  5. == Stacktrace ==
  6. File "<udfbody>", line 5, in main
  7. return calc_ratio_difference(max_price, transaction_amount)
  8. File "<udfbody>", line 3, in calc_ratio_difference
  9. return round(((n1 - n2)/n1),2) SQLSTATE: 39000
  10. == SQL (line 1, position 1) ==
  11. mycatalog.mydatabase.product_difference_ratio_on_demand_feature(`MaxProductAmount`, `Amount`)

字符串
这是我的训练集

  1. from databricks.feature_engineering import FeatureEngineeringClient, FeatureFunction, FeatureLookup
  2. fe = FeatureEngineeringClient()
  3. training_feature_lookups = [
  4. FeatureLookup(
  5. table_name="transaction_count_history",
  6. rename_outputs={
  7. "eventTimestamp": "TransactionTimestamp"
  8. },
  9. lookup_key=["CustomerID"],
  10. feature_names=["transactionCount", "isTimeout"],
  11. timestamp_lookup_key = "TransactionTimestamp"
  12. ),
  13. FeatureLookup(
  14. table_name="product_3minute_max_price_ft",
  15. rename_outputs={
  16. "LookupTimestamp": "TransactionTimestamp"
  17. },
  18. lookup_key=['Product'],
  19. timestamp_lookup_key='TransactionTimestamp'
  20. ),
  21. FeatureFunction(
  22. udf_name="product_difference_ratio_on_demand_feature",
  23. input_bindings={"max_price":"MaxProductAmount", "transaction_amount":"Amount"},
  24. output_name="MaxDifferenceRatio"
  25. )
  26. ]
  27. raw_transactions_df = spark.table("raw_transactions")
  28. training_set = fe.create_training_set(
  29. df=raw_transactions_df,
  30. feature_lookups=training_feature_lookups,
  31. label="Label",
  32. exclude_columns="_rescued_data"
  33. )
  34. training_df = training_set.load_df()


我最喜欢的是TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'
然而,所有的东西都是一个浮点数。浮点数进去,一个浮点数出来。函数本身在测试中工作得很好。

yjghlzjz

yjghlzjz1#

空值是在查找发生的时候创建的。我在基本框架上放置了一个最小时间戳。这确保了没有空值被输入。这是有意义的,考虑到NoneType错误。

  1. raw_transactions_df = sql("SELECT * FROM raw_transactions WHERE timestamp(TransactionTimestamp) > timestamp('2023-12-12T23:38:00.000+00:00')")

字符串

相关问题