scipy 求最长1序列的起始位置

clj7thdc 于 2022-11-10 发布在其他

关注(0)|答案(8)|浏览(135)

我想找出数组中最长的1序列的起始位置：

a1=[0,0,1,1,1,1,0,0,1,1]

# 2

我正在跟踪这个answer，以找到最长序列的长度。但是，我无法确定位置。

scipy

来源：https://stackoverflow.com/questions/38161606/find-the-start-position-of-the-longest-sequence-of-1s

8条答案

按热度按时间

643ylb081#

受this solution的启发，这里有一个矢量化的方法来解决这个问题-


# Get start, stop index pairs for islands/seq. of 1s

idx_pairs = np.where(np.diff(np.hstack(([False],a1==1,[False]))))[0].reshape(-1,2)

# Get the island lengths, whose argmax would give us the ID of longest island.

# Start index of that island would be the desired output

start_longest_seq = idx_pairs[np.diff(idx_pairs,axis=1).argmax(),0]

样品运行-

In [89]: a1 # Input array
Out[89]: array([0, 0, 1, 1, 1, 1, 0, 0, 1, 1])

In [90]: idx_pairs # Start, stop+1 index pairs
Out[90]: 
array([[ 2,  6],
       [ 8, 10]])

In [91]: np.diff(idx_pairs,axis=1) # Island lengths
Out[91]: 
array([[4],
       [2]])

In [92]: np.diff(idx_pairs,axis=1).argmax() # Longest island ID
Out[92]: 0

In [93]: idx_pairs[np.diff(idx_pairs,axis=1).argmax(),0] # Longest island start
Out[93]: 2

赞(0）回复(0）举报 2022-11-10

zvms9eto2#

使用groupby()的更紧凑的单行程序。对原始数据使用enumerate()以保持分析管道中的起始位置，最终以元组[（2，4），（8，2）]列表结束，每个元组包含非零游程的起始位置和长度：

from itertools import groupby

L = [0,0,1,1,1,1,0,0,1,1]

print max(((lambda y: (y[0][0], len(y)))(list(g)) for k, g in groupby(enumerate(L), lambda x: x[1]) if k), key=lambda z: z[1])[0]

lambda: x是groupby()的关键函数，因为我们枚举了L
lambda: y会封装我们需要的结果，因为我们只能评估g一次，而不会储存
lambda: z是max()拉出长度的关键函数
按预期打印“2”。

赞(0）回复(0）举报 2022-11-10

u5i3ibmn3#

这似乎是可行的，使用itertools中的groupby，只遍历列表一次：

from itertools import groupby

pos, max_len, cum_pos = 0, 0, 0

for k, g in groupby(a1):
    if k == 1:
        pat_size = len(list(g))
        pos, max_len = (pos, max_len) if pat_size < max_len else (cum_pos, pat_size)
        cum_pos += pat_size
    else:
        cum_pos += len(list(g))

pos

# 2

max_len

# 4

赞(0）回复(0）举报 2022-11-10

doinxwow4#

你可以使用for循环，检查下面的几项（长度为m，其中m是最大长度）是否与最大长度相同：


# Using your list and the answer from the post you referred

from itertools import groupby
L = [0,0,1,1,1,1,0,0,1,1]
m = max(sum(1 for i in g) for k, g in groupby(L))

# Here is the for loop

for i, s in enumerate(L):
    if len(L) - i + 2 < len(L) - m:
        break
    if s == 1 and 0 not in L[i:i+m]:
        print i
        break

这将给予：

赞(0）回复(0）举报 2022-11-10

2g32fytz5#

另一种在单个循环中执行的方法，但不需要求助于itertool的groupby。

max_start = 0
max_reps = 0
start = 0
reps = 0
for (pos, val) in enumerate(a1):
    start = pos if reps == 0 else start
    reps = reps + 1 if val == 1 else 0
    max_reps = max(reps, max_reps)
    max_start = start if reps == max_reps else max_start

这也可以使用reduce以一行程序的方式完成：

max_start = reduce(lambda (max_start, max_reps, start, reps), (pos, val): (start if reps == max(reps, max_reps) else max_start, max(reps, max_reps), pos if reps == 0 else start, reps + 1 if val == 1 else 0), enumerate(a1), (0, 0, 0, 0))[0]

在Python 3中，你不能在lambda参数定义中解包元组，所以最好先用def定义函数：

def func(acc, x):
    max_start, max_reps, start, reps = acc
    pos, val = x
    return (start if reps == max(reps, max_reps) else max_start,
            max(reps, max_reps),
            pos if reps == 0 else start,
            reps + 1 if val == 1 else 0)

max_start = reduce(func, enumerate(a1), (0, 0, 0, 0))[0]

在这三种情况中的任何一种情况下，max_start都会给出您的答案（即2）。

赞(0）回复(0）举报 2022-11-10

nzrxty8p6#

使用more_itertools（第三方库）：

给定

import itertools as it

import more_itertools as mit

lst = [0, 0, 1, 1, 1, 1, 0, 0, 1, 1]

代码

longest_contiguous = max([tuple(g) for _, g in it.groupby(lst)], key=len)
longest_contiguous    

# (1, 1, 1, 1)

pred = lambda w: w == longest_contiguous
next(mit.locate(mit.windowed(lst, len(longest_contiguous)), pred=pred))

# 2

另请参阅more_itertools.locate文件字串，以取得这些工具如何运作的详细信息。

赞(0）回复(0）举报 2022-11-10

0ve6wy6x7#

对于另一个只使用Numpy的解决方案，我认为这应该在所有情况下都有效。不过，投票最多的解决方案可能更快。

tmp = np.cumsum(np.insert(np.array(a1) != 1, 0, False))  # value of tmp[i+1] was not incremented when a1[i] is 1

# [0, 1, 2, 2, 2, 2, 2, 3, 4, 4, 4]

values, counts = np.unique(tmp, return_counts=True)

# [0, 1, 2, 3, 4], [1, 1, 5, 1, 3]

counts_idx = np.argmax(counts)
longest_sequence_length = counts[counts_idx] - 1

# 4

longest_sequence_idx = np.argmax(tmp == values[counts_idx])

# 2

赞(0）回复(0）举报 2022-11-10

mwyxok5s8#

我已经在haggis.npy_util.mask2runs中实现了一个numpy数组的run搜索函数，可以像这样使用它：

runs, lengths = mask2runs(a1, return_lengths=True)
result = runs[lengths.argmax(), 0]

赞(0）回复(0）举报 2022-11-10

我来回答

scipy 求最长1序列的起始位置

8条答案

相关问题

热门标签

最新问答