pyspark SparkRuntimeException：[UDF_USER_CODE_ERROR.GENERIC]函数执行失败

7jmck4yq 于 2024-01-06 发布在 Spark

关注(0)|答案(1)|浏览(210)

当我尝试显示由training_set创建的训练框架时，收到以下错误。

SparkRuntimeException: [UDF_USER_CODE_ERROR.GENERIC] Execution of function mycatalog.mydatabase.product_difference_ratio_on_demand_feature(left_MaxProductAmount#6091, left_Amount#6087) failed. 
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 217.0 failed 4 times, most recent failure: Lost task 0.3 in stage 217.0 (TID 823) (ip-10-0-32-203.us-west-2.compute.internal executor driver): org.apache.spark.SparkRuntimeException: [UDF_USER_CODE_ERROR.GENERIC] Execution of function mycatalog.mydatabase.product_difference_ratio_on_demand_feature(left_MaxProductAmount#6091, left_Amount#6087) failed. 
== Error ==
TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'
== Stacktrace ==
  File "<udfbody>", line 5, in main
    return calc_ratio_difference(max_price, transaction_amount)
  File "<udfbody>", line 3, in calc_ratio_difference
    return round(((n1 - n2)/n1),2) SQLSTATE: 39000
== SQL (line 1, position 1) ==
mycatalog.mydatabase.product_difference_ratio_on_demand_feature(`MaxProductAmount`, `Amount`)

字符串
这是我的训练集

from databricks.feature_engineering import FeatureEngineeringClient, FeatureFunction, FeatureLookup
fe = FeatureEngineeringClient()
training_feature_lookups = [
    FeatureLookup(
      table_name="transaction_count_history",
      rename_outputs={
          "eventTimestamp": "TransactionTimestamp"
        },
      lookup_key=["CustomerID"],
      feature_names=["transactionCount", "isTimeout"],
      timestamp_lookup_key = "TransactionTimestamp"
    ),
    FeatureLookup(
      table_name="product_3minute_max_price_ft",
      rename_outputs={
          "LookupTimestamp": "TransactionTimestamp"
        },
      lookup_key=['Product'],
      
      timestamp_lookup_key='TransactionTimestamp'
    ),
    FeatureFunction(
      udf_name="product_difference_ratio_on_demand_feature",
      input_bindings={"max_price":"MaxProductAmount", "transaction_amount":"Amount"},
      output_name="MaxDifferenceRatio"
    )
]
raw_transactions_df = spark.table("raw_transactions")
training_set = fe.create_training_set(
    df=raw_transactions_df,
    feature_lookups=training_feature_lookups,
    label="Label",
    exclude_columns="_rescued_data"
)
training_df = training_set.load_df()

型
我最喜欢的是TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'
然而，所有的东西都是一个浮点数。浮点数进去，一个浮点数出来。函数本身在测试中工作得很好。

pyspark

来源：https://stackoverflow.com/questions/77682314/how-do-i-address-this-generic-error-message-sparkruntimeexception-udf-user-co

1条答案

按热度按时间

yjghlzjz1#

空值是在查找发生的时候创建的。我在基本框架上放置了一个最小时间戳。这确保了没有空值被输入。这是有意义的，考虑到NoneType错误。

raw_transactions_df = sql("SELECT * FROM raw_transactions WHERE timestamp(TransactionTimestamp) > timestamp('2023-12-12T23:38:00.000+00:00')")

字符串

赞(0）回复(0）举报 2024-01-06

我来回答

pyspark SparkRuntimeException：[UDF_USER_CODE_ERROR.GENERIC]函数执行失败

1条答案

相关问题

热门标签

最新问答