将单独的时间序列合并到Tensorflow数据集中

qcuzuvrc 于 2023-01-13 发布在其他

关注(0)|答案(1)|浏览(277)

- bounty将在3天后过期**。回答此问题可获得+50声望奖励。foam78希望引起更多人关注此问题。

我有一个时间序列Pandas数据集的列表，并对每个数据集应用tf.metas.preprocessing.timeseries_dataset_from_array，然后使用tf.data.Dataset.from_tensor_slices和ds.interleave(lambda x: x, cycle_length=1, num_parallel_calls=tf.data.AUTOTUNE)将它们连接成一个数据集，这将产生一个tensorflow.Python.data.ops.dataset_ops.ParallelInterleaveDataset类型的对象。
这似乎减缓了我的训练速度，因为在使用Map时应用Map。我如何才能更好地组合这些数据集并生成一个tf.data.Dataset示例？
具体地说，我有对应于不同状态的单独的n-维时间序列数据，并且希望生成如下对的尽可能多的非重叠示例：m连续事件作为输入，后续事件作为预测，则训练输入将是所有m x n序列，目标将是对应的1 x n事件。

tensorflow

来源：https://stackoverflow.com/questions/74810897/combine-separate-timeseries-into-tensorflow-dataset

1条答案

按热度按时间

bq3bfh9z1#

跟进我上面的评论;

import tensorflow as tf

# Create a list of Pandas DataFrames
df_list = [df1, df2, ...]

# Convert the DataFrames to TensorFlow datasets
ds_list = [tf.data.Dataset.from_tensor_slices(df.values) for df in df_list]

# Concatenate the datasets
concatenated_ds = tf.data.Dataset.concatenate(ds_list)

# Apply the time series dataset function
sequence_length= m
ds = tf.data.experimental.timeseries_dataset_from_array(
    concatenated_ds,
    sequence_length=sequence_length,
    targets=concatenated_ds.skip(sequence_length),
    sequence_stride=1,
    shuffle=True)

这是一个如何转换示例然后将它们连接起来的示例。
以下是另一个示例，说明如何单独处理每个数据集，然后合并生成的数据集：

ds1 = tf.data.Dataset.from_tensor_slices(data1)
ds2 = tf.data.Dataset.from_tensor_slices(data2)
ds3 = tf.data.Dataset.from_tensor_slices(data3)

# Process dataset1
ds1 = ds1.map(preprocess_fn, num_parallel_calls=tf.data.AUTOTUNE)
ds1 = ds1.interleave(lambda x: x, cycle_length=1, num_parallel_calls=tf.data.AUTOTUNE)
ds1 = ds1.prefetch(buffer_size=tf.data.AUTOTUNE)

# Process dataset2
ds2 = ds2.map(preprocess_fn, num_parallel_calls=tf.data.AUTOTUNE)
ds2 = ds2.interleave(lambda x: x, cycle_length=1, num_parallel_calls=tf.data.AUTOTUNE)
ds2 = ds2.prefetch(buffer_size=tf.data.AUTOTUNE)

# Process dataset3
ds3 = ds3.map(preprocess_fn, num_parallel_calls=tf.data.AUTOTUNE)
ds3 = ds3.interleave(lambda x: x, cycle_length=1, num_parallel_calls=tf.data.AUTOTUNE)
ds3 = ds3.prefetch(buffer_size=tf.data.AUTOTUNE)

# Use concatenate method to combine the datasets
combined_ds = ds1.concatenate(ds2).concatenate(ds3)

另一种方法是使用tf.data.Dataset.zip（ds 1，ds 2，ds 3），它将压缩数据集并返回具有相同元素数的单个数据集。
希望这有帮助！

赞(0）回复(0）举报 2023-01-13

我来回答

将单独的时间序列合并到Tensorflow数据集中

1条答案

相关问题

热门标签

最新问答