python 实施要素列导致0%的准确度

lnxxn5zx  于 2023-05-27  发布在  Python
关注(0)|答案(1)|浏览(135)

为了周末锁定编码的乐趣,我试图将this keras tutorial应用于另一个问题。本教程将向您展示如何获取分类特征并嵌入到其中,以预测动物是否会被收养。
我学习了教程,并试图看看是否基于分类嵌入,我可以预测航班的时间(只是为了好玩,所以不确定这个问题是否有意义)。
我将代码应用于我的数据集,它似乎工作,但我得到了0.00%的准确率和一个警告,考虑用函数API重写这个模型。
下面是我的代码来重现这个问题,我不确定我做错了什么或遗漏了什么:

import numpy as np
import pandas as pd

import tensorflow as tf

from tensorflow import feature_column
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split

import pandas as pd
import numpy as np
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import LabelEncoder

dataframe = pd.read_csv('https://raw.githubusercontent.com/ismayc/pnwflights14/master/data/flights.csv')
dataframe = dataframe[dataframe['tailnum'].notna()]
target = 'air_time'
dataframe.head()

train, test = train_test_split(dataframe, test_size=0.2)
train, val = train_test_split(train, test_size=0.2)
print(len(train), 'train examples')
print(len(val), 'validation examples')
print(len(test), 'test examples')

# A utility method to create a tf.data dataset from a Pandas Dataframe
def df_to_dataset(dataframe, label_column, shuffle=True, batch_size=32):
    dataframe = dataframe.copy()
    labels = dataframe.pop(label_column)
    #labels = dataframe[label_column]

    ds = tf.data.Dataset.from_tensor_slices((dataframe.to_dict(orient='list'), labels))
    if shuffle:
        ds = ds.shuffle(buffer_size=len(dataframe))
    ds = ds.batch(batch_size)
    return ds

feature_columns = []

# numeric cols
for header in ['dep_time','dep_delay',  'arr_time', 'arr_delay', 'distance']:
  feature_columns.append(feature_column.numeric_column(header))

# indicator_columns
categorical_columns = [ 'carrier', 'tailnum', 'origin', 'dest'] 
for col_name in categorical_columns:
  categorical_column = feature_column.categorical_column_with_vocabulary_list(
      col_name, dataframe[col_name].unique())
  indicator_column = feature_column.indicator_column(categorical_column)
  feature_columns.append(indicator_column)

# embedding columns
breed1 = feature_column.categorical_column_with_vocabulary_list(
      'flight', dataframe.flight.unique())
breed1_embedding = feature_column.embedding_column(breed1, dimension=8)
feature_columns.append(breed1_embedding)

feature_layer = tf.keras.layers.DenseFeatures(feature_columns)

batch_size = 32
train_ds = df_to_dataset(train, label_column = target, batch_size=batch_size)
val_ds = df_to_dataset(val,label_column = target,  shuffle=False, batch_size=batch_size)
test_ds = df_to_dataset(test, label_column = target, shuffle=False, batch_size=batch_size)

model = tf.keras.Sequential([
  feature_layer,
  layers.Dense(128, activation='relu'),
  layers.Dense(128, activation='relu'),
  layers.Dropout(.1),
  layers.Dense(1)
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(train_ds,
          validation_data=val_ds,
          epochs=10)

loss, accuracy = model.evaluate(test_ds)
print("Accuracy", accuracy)

结果是:

103552 train examples
25888 validation examples
32361 test examples
Epoch 1/10
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'dict'> input: {'year': <tf.Tensor 'ExpandDims_14:0' shape=(None, 1) dtype=int32>, 'month': <tf.Tensor 'ExpandDims_11:0' shape=(None, 1) dtype=int32>, 'day': <tf.Tensor 'ExpandDims_3:0' shape=(None, 1) dtype=int32>, 'dep_time': <tf.Tensor 'ExpandDims_5:0' shape=(None, 1) dtype=float32>, 'dep_delay': <tf.Tensor 'ExpandDims_4:0' shape=(None, 1) dtype=float32>, 'arr_time': <tf.Tensor 'ExpandDims_1:0' shape=(None, 1) dtype=float32>, 'arr_delay': <tf.Tensor 'ExpandDims:0' shape=(None, 1) dtype=float32>, 'carrier': <tf.Tensor 'ExpandDims_2:0' shape=(None, 1) dtype=string>, 'tailnum': <tf.Tensor 'ExpandDims_13:0' shape=(None, 1) dtype=string>, 'flight': <tf.Tensor 'ExpandDims_8:0' shape=(None, 1) dtype=int32>, 'origin': <tf.Tensor 'ExpandDims_12:0' shape=(None, 1) dtype=string>, 'dest': <tf.Tensor 'ExpandDims_6:0' shape=(None, 1) dtype=string>, 'distance': <tf.Tensor 'ExpandDims_7:0' shape=(None, 1) dtype=int32>, 'hour': <tf.Tensor 'ExpandDims_9:0' shape=(None, 1) dtype=float32>, 'minute': <tf.Tensor 'ExpandDims_10:0' shape=(None, 1) dtype=float32>}
Consider rewriting this model with the Functional API.
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'dict'> input: {'year': <tf.Tensor 'ExpandDims_14:0' shape=(None, 1) dtype=int32>, 'month': <tf.Tensor 'ExpandDims_11:0' shape=(None, 1) dtype=int32>, 'day': <tf.Tensor 'ExpandDims_3:0' shape=(None, 1) dtype=int32>, 'dep_time': <tf.Tensor 'ExpandDims_5:0' shape=(None, 1) dtype=float32>, 'dep_delay': <tf.Tensor 'ExpandDims_4:0' shape=(None, 1) dtype=float32>, 'arr_time': <tf.Tensor 'ExpandDims_1:0' shape=(None, 1) dtype=float32>, 'arr_delay': <tf.Tensor 'ExpandDims:0' shape=(None, 1) dtype=float32>, 'carrier': <tf.Tensor 'ExpandDims_2:0' shape=(None, 1) dtype=string>, 'tailnum': <tf.Tensor 'ExpandDims_13:0' shape=(None, 1) dtype=string>, 'flight': <tf.Tensor 'ExpandDims_8:0' shape=(None, 1) dtype=int32>, 'origin': <tf.Tensor 'ExpandDims_12:0' shape=(None, 1) dtype=string>, 'dest': <tf.Tensor 'ExpandDims_6:0' shape=(None, 1) dtype=string>, 'distance': <tf.Tensor 'ExpandDims_7:0' shape=(None, 1) dtype=int32>, 'hour': <tf.Tensor 'ExpandDims_9:0' shape=(None, 1) dtype=float32>, 'minute': <tf.Tensor 'ExpandDims_10:0' shape=(None, 1) dtype=float32>}
Consider rewriting this model with the Functional API.
3227/3236 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.0000e+00WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'dict'> input: {'year': <tf.Tensor 'ExpandDims_14:0' shape=(None, 1) dtype=int32>, 'month': <tf.Tensor 'ExpandDims_11:0' shape=(None, 1) dtype=int32>, 'day': <tf.Tensor 'ExpandDims_3:0' shape=(None, 1) dtype=int32>, 'dep_time': <tf.Tensor 'ExpandDims_5:0' shape=(None, 1) dtype=float32>, 'dep_delay': <tf.Tensor 'ExpandDims_4:0' shape=(None, 1) dtype=float32>, 'arr_time': <tf.Tensor 'ExpandDims_1:0' shape=(None, 1) dtype=float32>, 'arr_delay': <tf.Tensor 'ExpandDims:0' shape=(None, 1) dtype=float32>, 'carrier': <tf.Tensor 'ExpandDims_2:0' shape=(None, 1) dtype=string>, 'tailnum': <tf.Tensor 'ExpandDims_13:0' shape=(None, 1) dtype=string>, 'flight': <tf.Tensor 'ExpandDims_8:0' shape=(None, 1) dtype=int32>, 'origin': <tf.Tensor 'ExpandDims_12:0' shape=(None, 1) dtype=string>, 'dest': <tf.Tensor 'ExpandDims_6:0' shape=(None, 1) dtype=string>, 'distance': <tf.Tensor 'ExpandDims_7:0' shape=(None, 1) dtype=int32>, 'hour': <tf.Tensor 'ExpandDims_9:0' shape=(None, 1) dtype=float32>, 'minute': <tf.Tensor 'ExpandDims_10:0' shape=(None, 1) dtype=float32>}
Consider rewriting this model with the Functional API.
3236/3236 [==============================] - 16s 5ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 2/10
3236/3236 [==============================] - 15s 5ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 3/10
3236/3236 [==============================] - 16s 5ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 4/10
3236/3236 [==============================] - 15s 5ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 5/10
3236/3236 [==============================] - 15s 5ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 6/10
3236/3236 [==============================] - 15s 4ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 7/10
3236/3236 [==============================] - 15s 5ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 8/10
3236/3236 [==============================] - 15s 5ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 9/10
3236/3236 [==============================] - 15s 5ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 10/10
3236/3236 [==============================] - 15s 5ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
1012/1012 [==============================] - 2s 2ms/step - loss: nan - accuracy: 0.0000e+00
Accuracy 0.0

我以为我遵循了教程并很好地应用了它,但我不知道我错在哪里。

mklgxw1f

mklgxw1f1#

主要有两个问题:
1.在flights.csv中的数据框加载中有5282 NaN,如果模型的输入是NaN,那么模型的输出也是NaN,因此你会损失NaN;因此,您可以使用dataframe = dataframe.fillna(method='pad')填充NaN
1.航班时刻预测是一个回归问题,而不是二元分类问题;因此,您应该更改model.compile中的参数,例如loss=tf.keras.losses.MeanSquaredError()metrics=['mae']
我在colab上运行的代码:

import numpy as np
import pandas as pd

import tensorflow as tf

from tensorflow import feature_column
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split

import pandas as pd
import numpy as np
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import LabelEncoder

dataframe = pd.read_csv('https://raw.githubusercontent.com/ismayc/pnwflights14/master/data/flights.csv')
dataframe = dataframe[dataframe['tailnum'].notna()]
target = 'air_time'
print(dataframe.isnull().sum().sum(), 'NaN in dataframe')
dataframe = dataframe.fillna(method='pad')
print(dataframe.isnull().sum().sum(), 'NaN after fill')

train, test = train_test_split(dataframe, test_size=0.2)
train, val = train_test_split(train, test_size=0.2)
print(len(train), 'train examples')
print(len(val), 'validation examples')
print(len(test), 'test examples')

# A utility method to create a tf.data dataset from a Pandas Dataframe
def df_to_dataset(dataframe, label_column, shuffle=True, batch_size=32):
    dataframe = dataframe.copy()
    labels = dataframe.pop(label_column)
    #labels = dataframe[label_column]

    ds = tf.data.Dataset.from_tensor_slices((dataframe.to_dict(orient='list'), labels))
    if shuffle:
        ds = ds.shuffle(buffer_size=len(dataframe))
    ds = ds.batch(batch_size)
    return ds

feature_columns = []

# numeric cols
for header in ['dep_time','dep_delay',  'arr_time', 'arr_delay', 'distance']:
  feature_columns.append(feature_column.numeric_column(header))

# indicator_columns
categorical_columns = [ 'carrier', 'tailnum', 'origin', 'dest'] 
for col_name in categorical_columns:
  categorical_column = feature_column.categorical_column_with_vocabulary_list(
      col_name, dataframe[col_name].unique())
  indicator_column = feature_column.indicator_column(categorical_column)
  feature_columns.append(indicator_column)

# embedding columns
breed1 = feature_column.categorical_column_with_vocabulary_list(
      'flight', dataframe.flight.unique())
breed1_embedding = feature_column.embedding_column(breed1, dimension=8)
feature_columns.append(breed1_embedding)

feature_layer = tf.keras.layers.DenseFeatures(feature_columns)

batch_size = 32
train_ds = df_to_dataset(train, label_column = target, batch_size=batch_size)
val_ds = df_to_dataset(val,label_column = target,  shuffle=False, batch_size=batch_size)
test_ds = df_to_dataset(test, label_column = target, shuffle=False, batch_size=batch_size)

model = tf.keras.Sequential([
  feature_layer,
  layers.Dense(128, activation='relu'),
  layers.Dense(128, activation='relu'),
  layers.Dropout(.1),
  layers.Dense(1, activation='relu')
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
              loss=tf.keras.losses.MeanSquaredError(),
              metrics=['mae'])

model.fit(train_ds,
          validation_data=val_ds,
          epochs=3)

loss, accuracy = model.evaluate(test_ds)
print("MeanAbsoluteError", accuracy)

结果给予:

5282 NaN in dataframe
0 NaN after fill
103552 train examples
25888 validation examples
32361 test examples
Epoch 1/3
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'dict'> input: {'year': <tf.Tensor 'ExpandDims_14:0' shape=(None, 1) dtype=int32>, 'month': <tf.Tensor 'ExpandDims_11:0' shape=(None, 1) dtype=int32>, 'day': <tf.Tensor 'ExpandDims_3:0' shape=(None, 1) dtype=int32>, 'dep_time': <tf.Tensor 'ExpandDims_5:0' shape=(None, 1) dtype=float32>, 'dep_delay': <tf.Tensor 'ExpandDims_4:0' shape=(None, 1) dtype=float32>, 'arr_time': <tf.Tensor 'ExpandDims_1:0' shape=(None, 1) dtype=float32>, 'arr_delay': <tf.Tensor 'ExpandDims:0' shape=(None, 1) dtype=float32>, 'carrier': <tf.Tensor 'ExpandDims_2:0' shape=(None, 1) dtype=string>, 'tailnum': <tf.Tensor 'ExpandDims_13:0' shape=(None, 1) dtype=string>, 'flight': <tf.Tensor 'ExpandDims_8:0' shape=(None, 1) dtype=int32>, 'origin': <tf.Tensor 'ExpandDims_12:0' shape=(None, 1) dtype=string>, 'dest': <tf.Tensor 'ExpandDims_6:0' shape=(None, 1) dtype=string>, 'distance': <tf.Tensor 'ExpandDims_7:0' shape=(None, 1) dtype=int32>, 'hour': <tf.Tensor 'ExpandDims_9:0' shape=(None, 1) dtype=float32>, 'minute': <tf.Tensor 'ExpandDims_10:0' shape=(None, 1) dtype=float32>}
Consider rewriting this model with the Functional API.
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'dict'> input: {'year': <tf.Tensor 'ExpandDims_14:0' shape=(None, 1) dtype=int32>, 'month': <tf.Tensor 'ExpandDims_11:0' shape=(None, 1) dtype=int32>, 'day': <tf.Tensor 'ExpandDims_3:0' shape=(None, 1) dtype=int32>, 'dep_time': <tf.Tensor 'ExpandDims_5:0' shape=(None, 1) dtype=float32>, 'dep_delay': <tf.Tensor 'ExpandDims_4:0' shape=(None, 1) dtype=float32>, 'arr_time': <tf.Tensor 'ExpandDims_1:0' shape=(None, 1) dtype=float32>, 'arr_delay': <tf.Tensor 'ExpandDims:0' shape=(None, 1) dtype=float32>, 'carrier': <tf.Tensor 'ExpandDims_2:0' shape=(None, 1) dtype=string>, 'tailnum': <tf.Tensor 'ExpandDims_13:0' shape=(None, 1) dtype=string>, 'flight': <tf.Tensor 'ExpandDims_8:0' shape=(None, 1) dtype=int32>, 'origin': <tf.Tensor 'ExpandDims_12:0' shape=(None, 1) dtype=string>, 'dest': <tf.Tensor 'ExpandDims_6:0' shape=(None, 1) dtype=string>, 'distance': <tf.Tensor 'ExpandDims_7:0' shape=(None, 1) dtype=int32>, 'hour': <tf.Tensor 'ExpandDims_9:0' shape=(None, 1) dtype=float32>, 'minute': <tf.Tensor 'ExpandDims_10:0' shape=(None, 1) dtype=float32>}
Consider rewriting this model with the Functional API.
3232/3236 [============================>.] - ETA: 0s - loss: 497.8120 - mae: 13.8204WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'dict'> input: {'year': <tf.Tensor 'ExpandDims_14:0' shape=(None, 1) dtype=int32>, 'month': <tf.Tensor 'ExpandDims_11:0' shape=(None, 1) dtype=int32>, 'day': <tf.Tensor 'ExpandDims_3:0' shape=(None, 1) dtype=int32>, 'dep_time': <tf.Tensor 'ExpandDims_5:0' shape=(None, 1) dtype=float32>, 'dep_delay': <tf.Tensor 'ExpandDims_4:0' shape=(None, 1) dtype=float32>, 'arr_time': <tf.Tensor 'ExpandDims_1:0' shape=(None, 1) dtype=float32>, 'arr_delay': <tf.Tensor 'ExpandDims:0' shape=(None, 1) dtype=float32>, 'carrier': <tf.Tensor 'ExpandDims_2:0' shape=(None, 1) dtype=string>, 'tailnum': <tf.Tensor 'ExpandDims_13:0' shape=(None, 1) dtype=string>, 'flight': <tf.Tensor 'ExpandDims_8:0' shape=(None, 1) dtype=int32>, 'origin': <tf.Tensor 'ExpandDims_12:0' shape=(None, 1) dtype=string>, 'dest': <tf.Tensor 'ExpandDims_6:0' shape=(None, 1) dtype=string>, 'distance': <tf.Tensor 'ExpandDims_7:0' shape=(None, 1) dtype=int32>, 'hour': <tf.Tensor 'ExpandDims_9:0' shape=(None, 1) dtype=float32>, 'minute': <tf.Tensor 'ExpandDims_10:0' shape=(None, 1) dtype=float32>}
Consider rewriting this model with the Functional API.
3236/3236 [==============================] - 22s 6ms/step - loss: 497.4619 - mae: 13.8162 - val_loss: 99.0488 - val_mae: 6.2621
Epoch 2/3
3236/3236 [==============================] - 20s 6ms/step - loss: 197.7995 - mae: 9.6854 - val_loss: 80.7915 - val_mae: 5.3355
Epoch 3/3
3236/3236 [==============================] - 21s 6ms/step - loss: 179.8991 - mae: 9.1736 - val_loss: 86.6206 - val_mae: 5.6779
1012/1012 [==============================] - 2s 2ms/step - loss: 98.2659 - mae: 5.6766
MeanAbsoluteError 5.676607608795166

相关问题