为了周末锁定编码的乐趣,我试图将this keras tutorial应用于另一个问题。本教程将向您展示如何获取分类特征并嵌入到其中,以预测动物是否会被收养。
我学习了教程,并试图看看是否基于分类嵌入,我可以预测航班的时间(只是为了好玩,所以不确定这个问题是否有意义)。
我将代码应用于我的数据集,它似乎工作,但我得到了0.00%的准确率和一个警告,考虑用函数API重写这个模型。
下面是我的代码来重现这个问题,我不确定我做错了什么或遗漏了什么:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import feature_column
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import LabelEncoder
dataframe = pd.read_csv('https://raw.githubusercontent.com/ismayc/pnwflights14/master/data/flights.csv')
dataframe = dataframe[dataframe['tailnum'].notna()]
target = 'air_time'
dataframe.head()
train, test = train_test_split(dataframe, test_size=0.2)
train, val = train_test_split(train, test_size=0.2)
print(len(train), 'train examples')
print(len(val), 'validation examples')
print(len(test), 'test examples')
# A utility method to create a tf.data dataset from a Pandas Dataframe
def df_to_dataset(dataframe, label_column, shuffle=True, batch_size=32):
dataframe = dataframe.copy()
labels = dataframe.pop(label_column)
#labels = dataframe[label_column]
ds = tf.data.Dataset.from_tensor_slices((dataframe.to_dict(orient='list'), labels))
if shuffle:
ds = ds.shuffle(buffer_size=len(dataframe))
ds = ds.batch(batch_size)
return ds
feature_columns = []
# numeric cols
for header in ['dep_time','dep_delay', 'arr_time', 'arr_delay', 'distance']:
feature_columns.append(feature_column.numeric_column(header))
# indicator_columns
categorical_columns = [ 'carrier', 'tailnum', 'origin', 'dest']
for col_name in categorical_columns:
categorical_column = feature_column.categorical_column_with_vocabulary_list(
col_name, dataframe[col_name].unique())
indicator_column = feature_column.indicator_column(categorical_column)
feature_columns.append(indicator_column)
# embedding columns
breed1 = feature_column.categorical_column_with_vocabulary_list(
'flight', dataframe.flight.unique())
breed1_embedding = feature_column.embedding_column(breed1, dimension=8)
feature_columns.append(breed1_embedding)
feature_layer = tf.keras.layers.DenseFeatures(feature_columns)
batch_size = 32
train_ds = df_to_dataset(train, label_column = target, batch_size=batch_size)
val_ds = df_to_dataset(val,label_column = target, shuffle=False, batch_size=batch_size)
test_ds = df_to_dataset(test, label_column = target, shuffle=False, batch_size=batch_size)
model = tf.keras.Sequential([
feature_layer,
layers.Dense(128, activation='relu'),
layers.Dense(128, activation='relu'),
layers.Dropout(.1),
layers.Dense(1)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(train_ds,
validation_data=val_ds,
epochs=10)
loss, accuracy = model.evaluate(test_ds)
print("Accuracy", accuracy)
结果是:
103552 train examples
25888 validation examples
32361 test examples
Epoch 1/10
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'dict'> input: {'year': <tf.Tensor 'ExpandDims_14:0' shape=(None, 1) dtype=int32>, 'month': <tf.Tensor 'ExpandDims_11:0' shape=(None, 1) dtype=int32>, 'day': <tf.Tensor 'ExpandDims_3:0' shape=(None, 1) dtype=int32>, 'dep_time': <tf.Tensor 'ExpandDims_5:0' shape=(None, 1) dtype=float32>, 'dep_delay': <tf.Tensor 'ExpandDims_4:0' shape=(None, 1) dtype=float32>, 'arr_time': <tf.Tensor 'ExpandDims_1:0' shape=(None, 1) dtype=float32>, 'arr_delay': <tf.Tensor 'ExpandDims:0' shape=(None, 1) dtype=float32>, 'carrier': <tf.Tensor 'ExpandDims_2:0' shape=(None, 1) dtype=string>, 'tailnum': <tf.Tensor 'ExpandDims_13:0' shape=(None, 1) dtype=string>, 'flight': <tf.Tensor 'ExpandDims_8:0' shape=(None, 1) dtype=int32>, 'origin': <tf.Tensor 'ExpandDims_12:0' shape=(None, 1) dtype=string>, 'dest': <tf.Tensor 'ExpandDims_6:0' shape=(None, 1) dtype=string>, 'distance': <tf.Tensor 'ExpandDims_7:0' shape=(None, 1) dtype=int32>, 'hour': <tf.Tensor 'ExpandDims_9:0' shape=(None, 1) dtype=float32>, 'minute': <tf.Tensor 'ExpandDims_10:0' shape=(None, 1) dtype=float32>}
Consider rewriting this model with the Functional API.
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'dict'> input: {'year': <tf.Tensor 'ExpandDims_14:0' shape=(None, 1) dtype=int32>, 'month': <tf.Tensor 'ExpandDims_11:0' shape=(None, 1) dtype=int32>, 'day': <tf.Tensor 'ExpandDims_3:0' shape=(None, 1) dtype=int32>, 'dep_time': <tf.Tensor 'ExpandDims_5:0' shape=(None, 1) dtype=float32>, 'dep_delay': <tf.Tensor 'ExpandDims_4:0' shape=(None, 1) dtype=float32>, 'arr_time': <tf.Tensor 'ExpandDims_1:0' shape=(None, 1) dtype=float32>, 'arr_delay': <tf.Tensor 'ExpandDims:0' shape=(None, 1) dtype=float32>, 'carrier': <tf.Tensor 'ExpandDims_2:0' shape=(None, 1) dtype=string>, 'tailnum': <tf.Tensor 'ExpandDims_13:0' shape=(None, 1) dtype=string>, 'flight': <tf.Tensor 'ExpandDims_8:0' shape=(None, 1) dtype=int32>, 'origin': <tf.Tensor 'ExpandDims_12:0' shape=(None, 1) dtype=string>, 'dest': <tf.Tensor 'ExpandDims_6:0' shape=(None, 1) dtype=string>, 'distance': <tf.Tensor 'ExpandDims_7:0' shape=(None, 1) dtype=int32>, 'hour': <tf.Tensor 'ExpandDims_9:0' shape=(None, 1) dtype=float32>, 'minute': <tf.Tensor 'ExpandDims_10:0' shape=(None, 1) dtype=float32>}
Consider rewriting this model with the Functional API.
3227/3236 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.0000e+00WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'dict'> input: {'year': <tf.Tensor 'ExpandDims_14:0' shape=(None, 1) dtype=int32>, 'month': <tf.Tensor 'ExpandDims_11:0' shape=(None, 1) dtype=int32>, 'day': <tf.Tensor 'ExpandDims_3:0' shape=(None, 1) dtype=int32>, 'dep_time': <tf.Tensor 'ExpandDims_5:0' shape=(None, 1) dtype=float32>, 'dep_delay': <tf.Tensor 'ExpandDims_4:0' shape=(None, 1) dtype=float32>, 'arr_time': <tf.Tensor 'ExpandDims_1:0' shape=(None, 1) dtype=float32>, 'arr_delay': <tf.Tensor 'ExpandDims:0' shape=(None, 1) dtype=float32>, 'carrier': <tf.Tensor 'ExpandDims_2:0' shape=(None, 1) dtype=string>, 'tailnum': <tf.Tensor 'ExpandDims_13:0' shape=(None, 1) dtype=string>, 'flight': <tf.Tensor 'ExpandDims_8:0' shape=(None, 1) dtype=int32>, 'origin': <tf.Tensor 'ExpandDims_12:0' shape=(None, 1) dtype=string>, 'dest': <tf.Tensor 'ExpandDims_6:0' shape=(None, 1) dtype=string>, 'distance': <tf.Tensor 'ExpandDims_7:0' shape=(None, 1) dtype=int32>, 'hour': <tf.Tensor 'ExpandDims_9:0' shape=(None, 1) dtype=float32>, 'minute': <tf.Tensor 'ExpandDims_10:0' shape=(None, 1) dtype=float32>}
Consider rewriting this model with the Functional API.
3236/3236 [==============================] - 16s 5ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 2/10
3236/3236 [==============================] - 15s 5ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 3/10
3236/3236 [==============================] - 16s 5ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 4/10
3236/3236 [==============================] - 15s 5ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 5/10
3236/3236 [==============================] - 15s 5ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 6/10
3236/3236 [==============================] - 15s 4ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 7/10
3236/3236 [==============================] - 15s 5ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 8/10
3236/3236 [==============================] - 15s 5ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 9/10
3236/3236 [==============================] - 15s 5ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 10/10
3236/3236 [==============================] - 15s 5ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
1012/1012 [==============================] - 2s 2ms/step - loss: nan - accuracy: 0.0000e+00
Accuracy 0.0
我以为我遵循了教程并很好地应用了它,但我不知道我错在哪里。
1条答案
按热度按时间mklgxw1f1#
主要有两个问题:
1.在flights.csv中的数据框加载中有
5282 NaN
,如果模型的输入是NaN
,那么模型的输出也是NaN
,因此你会损失NaN
;因此,您可以使用dataframe = dataframe.fillna(method='pad')
填充NaN
1.航班时刻预测是一个回归问题,而不是二元分类问题;因此,您应该更改
model.compile
中的参数,例如loss=tf.keras.losses.MeanSquaredError()
和metrics=['mae']
我在colab上运行的代码:
结果给予: