pytorch 如何将random_split与百分比拆分一起使用(输入长度之和不等于输入数据集的长度)

z9smfwbn 于 2022-11-09 发布在其他

关注(0)|答案(1)|浏览(507)

我尝试使用torch.utils.data.random_split，如下所示：

import torch
from torch.utils.data import DataLoader, random_split
list_dataset = [1,2,3,4,5,6,7,8,9,10]
dataset = DataLoader(list_dataset, batch_size=1, shuffle=False)
random_split(dataset, [0.8, 0.1, 0.1], generator=torch.Generator().manual_seed(123))

但是，当我尝试这样做时，我得到了错误raise ValueError("Sum of input lengths does not equal the length of the input dataset!")
我看了看文档，似乎我应该能够传递小数，总和为1，但显然它不工作。
我还在谷歌上搜索了这个错误，最接近的结果是this issue。
我做错了什么？

pytorch

来源：https://stackoverflow.com/questions/74327447/how-to-use-random-split-with-percentage-split-sum-of-input-lengths-does-not-equ

1条答案

按热度按时间

zd287kbt1#

您可能使用的是旧版本的PyTorch，如Pytorch1.10，它不具有此功能。
要在旧版本中复制此功能，只需复制新版本的源代码即可：

import math
from torch import default_generator, randperm
from torch._utils import _accumulate
from torch.utils.data.dataset import Subset
def random_split(dataset, lengths,
                 generator=default_generator):
    r"""
    Randomly split a dataset into non-overlapping new datasets of given lengths.
    If a list of fractions that sum up to 1 is given,
    the lengths will be computed automatically as
    floor(frac * len(dataset)) for each fraction provided.
    After computing the lengths, if there are any remainders, 1 count will be
    distributed in round-robin fashion to the lengths
    until there are no remainders left.
    Optionally fix the generator for reproducible results, e.g.:
    >>> random_split(range(10), [3, 7], generator=torch.Generator().manual_seed(42))
    >>> random_split(range(30), [0.3, 0.3, 0.4], generator=torch.Generator(
    ...   ).manual_seed(42))
    Args:
        dataset (Dataset): Dataset to be split
        lengths (sequence): lengths or fractions of splits to be produced
        generator (Generator): Generator used for the random permutation.
    """
    if math.isclose(sum(lengths), 1) and sum(lengths) <= 1:
        subset_lengths: List[int] = []
        for i, frac in enumerate(lengths):
            if frac < 0 or frac > 1:
                raise ValueError(f"Fraction at index {i} is not between 0 and 1")
            n_items_in_split = int(
                math.floor(len(dataset) * frac)  # type: ignore[arg-type]
            )
            subset_lengths.append(n_items_in_split)
        remainder = len(dataset) - sum(subset_lengths)  # type: ignore[arg-type]
        # add 1 to all the lengths in round-robin fashion until the remainder is 0
        for i in range(remainder):
            idx_to_add_at = i % len(subset_lengths)
            subset_lengths[idx_to_add_at] += 1
        lengths = subset_lengths
        for i, length in enumerate(lengths):
            if length == 0:
                warnings.warn(f"Length of split at index {i} is 0. "
                              f"This might result in an empty dataset.")
    # Cannot verify that dataset is Sized
    if sum(lengths) != len(dataset):    # type: ignore[arg-type]
        raise ValueError("Sum of input lengths does not equal the length of the input dataset!")
    indices = randperm(sum(lengths), generator=generator).tolist()  # type: ignore[call-overload]
    return [Subset(dataset, indices[offset - length : offset]) for offset, length in zip(_accumulate(lengths), lengths)]

展开查看全部

赞(0）回复(0）举报 2022-11-09

我来回答

pytorch 如何将random_split与百分比拆分一起使用(输入长度之和不等于输入数据集的长度)

1条答案

相关问题

热门标签

最新问答