当我尝试显示由training_set创建的训练框架时,收到以下错误。
SparkRuntimeException: [UDF_USER_CODE_ERROR.GENERIC] Execution of function mycatalog.mydatabase.product_difference_ratio_on_demand_feature(left_MaxProductAmount#6091, left_Amount#6087) failed.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 217.0 failed 4 times, most recent failure: Lost task 0.3 in stage 217.0 (TID 823) (ip-10-0-32-203.us-west-2.compute.internal executor driver): org.apache.spark.SparkRuntimeException: [UDF_USER_CODE_ERROR.GENERIC] Execution of function mycatalog.mydatabase.product_difference_ratio_on_demand_feature(left_MaxProductAmount#6091, left_Amount#6087) failed.
== Error ==
TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'
== Stacktrace ==
File "<udfbody>", line 5, in main
return calc_ratio_difference(max_price, transaction_amount)
File "<udfbody>", line 3, in calc_ratio_difference
return round(((n1 - n2)/n1),2) SQLSTATE: 39000
== SQL (line 1, position 1) ==
mycatalog.mydatabase.product_difference_ratio_on_demand_feature(`MaxProductAmount`, `Amount`)
字符串
这是我的训练集
from databricks.feature_engineering import FeatureEngineeringClient, FeatureFunction, FeatureLookup
fe = FeatureEngineeringClient()
training_feature_lookups = [
FeatureLookup(
table_name="transaction_count_history",
rename_outputs={
"eventTimestamp": "TransactionTimestamp"
},
lookup_key=["CustomerID"],
feature_names=["transactionCount", "isTimeout"],
timestamp_lookup_key = "TransactionTimestamp"
),
FeatureLookup(
table_name="product_3minute_max_price_ft",
rename_outputs={
"LookupTimestamp": "TransactionTimestamp"
},
lookup_key=['Product'],
timestamp_lookup_key='TransactionTimestamp'
),
FeatureFunction(
udf_name="product_difference_ratio_on_demand_feature",
input_bindings={"max_price":"MaxProductAmount", "transaction_amount":"Amount"},
output_name="MaxDifferenceRatio"
)
]
raw_transactions_df = spark.table("raw_transactions")
training_set = fe.create_training_set(
df=raw_transactions_df,
feature_lookups=training_feature_lookups,
label="Label",
exclude_columns="_rescued_data"
)
training_df = training_set.load_df()
型
我最喜欢的是TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'
然而,所有的东西都是一个浮点数。浮点数进去,一个浮点数出来。函数本身在测试中工作得很好。
1条答案
按热度按时间yjghlzjz1#
空值是在查找发生的时候创建的。我在基本框架上放置了一个最小时间戳。这确保了没有空值被输入。这是有意义的,考虑到
NoneType
错误。字符串