python 从大量数据中高效查找重叠范围

kmbjn2e3 于 2023-08-02 发布在 Python

关注(0)|答案(1)|浏览(127)

是否有一个有效的算法来检查多个有界连续范围之间的重叠？
如果我们有一对任务，每个任务由开始时间和结束时间表示，则下面的函数可以检测冲突。

# tasks are of the form (start_time, end_time)

def has_clash(task_1: tuple[float, float], task_2: tuple[float, float]) -> bool:
    """Check if a clash exists between two tasks."""
    return (task_2[1] > task_1[0]) and (task_2[0] < task_1[1])

字符串
要检查多个任务，我们必须在任务集上成对运行上述测试，总复杂度为O（n^2），例如：

def has_any_clash(tasks: list[tuple[float, float]]) -> bool:
    """Check if any tasks clash, return True if no clashes, false otherwise."""
    if len(tasks) < 2:
        return False
    clash = any(
        has_clash(task, other_task)
        for i, (task) in enumerate(tasks[1:])
        for other_task in tasks[:i]
    )
    return clash

型
这种复杂性是否可以通过例如排序或者其他数学魔法
有没有一种算法可以更有效地做到这一点？

python

来源：https://stackoverflow.com/questions/76778628/find-overlapping-ranges-from-a-large-set-of-data-efficiently

1条答案

按热度按时间

q35jwt9p1#

按开始时间对列表进行排序，然后将连续条目的最后一个结束时间与下一个开始时间进行比较。这样做的复杂度是O（nlogn）排序和O（n）迭代对。

def has_any_clash(tasks: list[tuple[float, float]]) -> bool:
    """Check if any tasks clash, return True if no clashes, false otherwise."""
    sorted_tasks = sorted(tasks)
    return any(a2 > b1 for (a1, a2), (b1, b2) in zip(sorted_tasks, sorted_tasks[1:]))

print(has_any_clash([]))  # False
print(has_any_clash([(0, 4)]))  # False
print(has_any_clash([(1, 5), (0, 4)]))  # True
print(has_any_clash([(5, 8), (0, 4), (6, 7)]))  # True
print(has_any_clash([(9, 10), (0, 4), (6, 7)]))  # False

字符串
如果你想要一个列表 * 哪些 * 任务冲突（这些可能比连续的任务更多），这会涉及更多，但不会太多。除上述内容外：

使用heap跟踪“正在运行”的任务，按结束时间排序
在每次迭代中，迭代按开始时间排序的任务：
当结束时间小于当前任务的开始时间时，从运行堆中弹出任务
正在运行的堆中的所有剩余任务与当前任务冲突
将当前任务添加到运行堆

赞(0）回复(0）举报 2023-08-02

我来回答

python 从大量数据中高效查找重叠范围

1条答案

相关问题

热门标签

最新问答