scipy 使正交格线拟合噪波坐标

uyto3xhc  于 2022-11-10  发布在  其他
关注(0)|答案(3)|浏览(153)

问题
我有一个要形成网格的坐标列表。每个坐标都有一个随机误差分量,并且缺少一些坐标。Grid could be rotated (update).我想用一个正交网格来拟合数据点,并返回网格顶点的列表。例如:

应用程序

其目的是在扫描图像中找到一个网格。数据点来自OpenCV中轮廓或边缘检测的结果。一个例子是带有照片网格的图像。
目标
我写了一些Python代码,但我想找到一个使用SciPy、statmodels或其他模块的线性代数算法,它会更健壮,并处理网格的小旋转(小于10°)。

仅使用列表的Python代码


# Noisy [x, y] coordinates (origin is upper-left corner)

pts = [[103,101],
       [198,103],
       [300, 99],
       [ 97,205],
       [304,202],
       [102,295],
       [200,303],
       [104,405],
       [205,394],
       [298,401]]

def row_col_avgs(num_list, ratio):
    # Finds the average of each row and column. Coordinates are
    # assigned to a row and column by specifying an error ratio.
    last_num, sum_nums, count_nums, avgs = 0, 0, 0, []
    num_list.sort()
    for num in num_list:
        # Calculate average for last row or column and begin new row or column
        if num > (1+ratio)*last_num and count_nums != 0:
            avgs.append(int(round(sum_nums/count_nums,0)))
            sum_nums = num
            count_nums = 1
        # Or continue with current row or column
        else:
            sum_nums += num
            count_nums += 1
        last_num = num
    avgs.append(int(round(sum_nums/count_nums,0)))
    return avgs

# Split coordinates into two lists of x's and y's

xs, ys = map(list, zip(*pts))

# Find averages of each row and column of the grid

x_avgs = row_col_avgs(xs, 0.1)
y_avgs = row_col_avgs(ys, 0.1)

# Return vertices of completed averaged grid

avg_grid = []
for y_avg in y_avgs:
    avg_row = []
    for x_avg in x_avgs:
        avg_row.append([int(x_avg), int(y_avg)])
    avg_grid.append(avg_row)

print(avg_grid)

输出

[[[102, 101], [201, 101], [301, 101]], 
 [[102, 204], [201, 204], [301, 204]], 
 [[102, 299], [201, 299], [301, 299]], 
 [[102, 400], [201, 400], [301, 400]]]
wqsoz72f

wqsoz72f1#

下面是代码的一个简单实现。由于AvgGrid的大小是已知的,我预先分配了所需的内存(而不是追加)。这应该有速度优势,特别是当输出顶点的数量很大的时候。

import numpy as np

# Input of [x, y] coordinates of a sparse grid with errors

xys = np.array([[103,101],
       [198,103],
       [300, 99],
       [ 97,205],
       [304,202],
       [102,295],
       [200,303],
       [104,405],
       [205,394],
       [298,401]])

# Function to average

def ColAvgs(CoordinateList, CutoffRatio = 1.1):

    # Length of CoordinateList
    L = len(CoordinateList)

    # Sort input
    SortedList = np.sort(CoordinateList)

    # Determine indices to average
    RelativeIncrease = SortedList[-(L-1):]/SortedList[:(L-1)]
    CriticalIndices = np.flatnonzero(RelativeIncrease > CutoffRatio) + 1
    Indices = np.hstack((0,CriticalIndices))
    if (Indices[-1] != L):
        Indices = np.hstack((Indices,L))
    #print(Indices)     # Uncomment to show index construction

    # Compute averages
    Avgs = np.empty((len(Indices)-1)); Avgs[:] = np.NaN
    for iter in range(len(Avgs)):
        Avgs[iter] = int( round(np.mean(SortedList[Indices[iter]:Indices[(iter+1)]]) ) )

    # Return output
    return Avgs

# Compute x- and y-coordinates of vertices

AvgsXcoord = ColAvgs(xys[:,0])
AvgsYcoord = ColAvgs(xys[:,1])

# Return all vertices

AvgGrid = np.empty((len(AvgsXcoord)*len(AvgsYcoord),2)); AvgGrid[:] = np.NaN
iter = 0
for y in AvgsYcoord:
    for x in AvgsXcoord:
        AvgGrid[iter, :] = np.hstack((x,y))
        iter = iter+1
print(AvgGrid)
eeq64g8w

eeq64g8w2#

如果将所有点投影到垂直轴或水平轴上,问题就变成了一个等间距聚类的问题。

要执行这些聚类,可以考虑连续(排序)点之间的距离。它们将形成两个聚类:短距离对应于噪声,而长距离对应于网格大小。您可以使用大津方法解决双向聚类问题。

eqfvzcg8

eqfvzcg83#

平行斜率普通最小二乘(OLS)模型:

y = mx + grp + B,其中m=斜率,b= y轴截距,& grp=分类变量。
这是一种可以处理旋转栅格的替代算法。
OLS模型包括原始方向上的数据点和相同数据点的90°旋转。这是必要的,以便所有网格线平行并具有相同的斜率。

  • 算法:*

1.通过在第一行或最后一行中选择斜率最接近零的两个相邻点,找到一条参考网格线,与其余点进行比较。
1.计算此参考线与其余点之间的距离。
1.根据计算出的距离将点分段成组(每条网格线一组)。
1.对旋转90度的网格重复步骤1到3,并合并结果。
1.建立平行坡度OLS模型以确定格缐的缐性方程式。
1.将旋转后的网格线旋转回其原始方向。
1.计算交点。

  • 注:如果噪音、Angular 和/或丢失数据过多,则会失败。*

示例:

用于创建示例的Python代码

def create_random_example():
    # Requires import of numpy and random packages
    # Creates grid with random noise and missing points
    # Example will fail if std_dev, rotation, pct_removed too large

    # Parameters
    first_row, last_row = 100, 900
    first_col, last_col = 100, 600
    num_rows = 6
    num_cols = 4
    rotation = 3 # degrees that grid is rotated
    sd = 3 # percent std dev of avg x and avg y coordinates
    pct_remove = 30 # percent of points to randomly remove from data

    # Create grid
    x = np.linspace(first_col, last_col, num_cols)
    y = np.linspace(first_row, last_row, num_rows)
    xx, yy = np.meshgrid(x, y)

    # Add noise
    x = xx.flatten() + sd * np.mean(xx) * np.random.randn(xx.size) / 100
    y = yy.flatten() + sd * np.mean(yy) * np.random.randn(yy.size) / 100

    # Randomly remove points
    random_list = random.sample(range(0, num_cols*num_rows), 
                          int(pct_remove*num_cols*num_rows/100))
    x, y = np.delete(x, random_list), np.delete(y, random_list)

    pts = np.column_stack((x, y))

    # Rotate points
    radians = np.radians(rotation)
    rot_mat = np.array([[np.cos(radians),-np.sin(radians)],
                        [np.sin(radians), np.cos(radians)]])
    einsum = np.einsum('ji, mni -> jmn', rot_mat, [pts])
    pts = np.squeeze(einsum).T

    return np.rint(pts)

适合网格线的Python代码

import numpy as np
import pandas as pd
import itertools
import math
import random
from statsmodels.formula.api import ols
from scipy.spatial import KDTree
import matplotlib.pyplot as plt

def pt_line_dist(pt, ref_line):
    pt1, pt2 = [ref_line[:2], ref_line[2:]]
    # Distance from point to line defined by two other points
    return np.linalg.norm(np.cross(pt1 - pt2, [pt[0],pt[1]])) \
         / np.linalg.norm(pt1 - pt2)

def segment_pts(amts, grp_var, grp_label):
    # Segment on amounts (distances here) in last column of array
    # Note: need to label groups with string for OLS model
    amts = amts[amts[:, -1].argsort()]
    first_amt_in_grp = amts[0][-1]
    group, groups, grp = [], [], 0
    for amt in amts:
        if amt[-1] - first_amt_in_grp > grp_var:
            groups.append(group)
            first_amt_in_grp = amt[-1]
            group = []; grp += 1
        group.append(np.append(amt[:-1],[[grp_label + str(grp)]]))
    groups.append(group)
    return groups

def find_reference_line(pts):
    # Find point with minimum absolute slope relative both min y and max y
    y = np.hsplit(pts, 2)[1] # y column of array
    m = []
    for i, y_pt in enumerate([ pts[np.argmin(y)], pts[np.argmax(y)] ]):
        m.append(np.zeros((pts.shape[0]-1, 5))) # dtype default is float64
        m[i][:,2:4] = np.delete(pts, np.where((pts==y_pt).all(axis=1))[0], axis=0)
        m[i][:,4] = abs( (m[i][:,3]-y_pt[1]) / (m[i][:,2]-y_pt[0]) )
        m[i][:,:2] = y_pt
    m = np.vstack((m[0], m[1]))
    return m[np.argmin(m[:,4]), :4]

# Ignore division by zero (slopes of vertical lines)

np.seterr(divide='ignore')

# Create dataset and plot

pts = create_random_example()
plt.scatter(pts[:,0], pts[:,1], c='r') # plot now because pts array changes

# Average distance to the nearest neighbor of each point

tree = KDTree(pts)
nn_avg_dist = np.mean(tree.query(pts, 2)[0][:, 1])

# Find groups of points representing each gridline

groups = []
for orientation in ['o', 'r']: #  original and rotated orientations

    # Rotate points 90 degrees (note: this moves pts to 2nd quadrant)
    if orientation == 'r':
        pts[:,1] = -1 * pts[:,1]
        pts[:, [1, 0]] = pts[:, [0, 1]]

    # Find reference line to compare remaining points for grouping
    ref_line = find_reference_line(pts) # line is defined by two points

    # Distances between points and reference line
    pt_dists = np.zeros((pts.shape[0], 3))
    pt_dists[:,:2] = pts
    pt_dists[:,2] = np.apply_along_axis(pt_line_dist, 1, pts, ref_line).T

    # Segment pts into groups w.r.t. distances (one group per gridline)
    # Groups have range less than nn_avg_dist.
    groups += segment_pts(pt_dists, 0.7*nn_avg_dist, orientation)

# Create dataframe of groups (OLS model requires a dataframe)

df = pd.DataFrame(np.row_stack(groups), columns=['x', 'y', 'grp'])
df['x'] = pd.to_numeric(df['x'])
df['y'] = pd.to_numeric(df['y'])

# Parallel slopes OLS model

ols_model = ols("y ~ x + grp + 0", data=df).fit()

# OLS parameters

grid_lines = ols_model.params[:-1].to_frame() # panda series to dataframe
grid_lines = grid_lines.rename(columns = {0:'b'})
grid_lines['grp'] = grid_lines.index.str[4:6]
grid_lines['m'] = ols_model.params[-1] # slope

# Rotate the rotated lines back to their original orientation

grid_lines.loc[grid_lines['grp'].str[0] == 'r', 'b'] = grid_lines['b'] / grid_lines['m']
grid_lines.loc[grid_lines['grp'].str[0] == 'r', 'm'] = -1 / grid_lines['m']

# Find grid intersection points by combinations of gridlines

comb = list(itertools.combinations(grid_lines['grp'], 2))
comb = [i for i in comb if i[0][0] != 'r']
comb = [i for i in comb if i[1][0] != 'o']
df_comb = pd.DataFrame(comb, columns=['grp', 'r_grp'])

# Merge gridline parameters with grid points

grid_pts = df_comb.merge(grid_lines.drop_duplicates('grp'),how='left',on='grp')
grid_lines.rename(columns={'grp': 'r_grp'}, inplace=True)
grid_pts.rename(columns={'b':'o_b', 'm': 'o_m', 'grp':'o_grp'}, inplace=True)
grid_pts = grid_pts.merge(grid_lines.drop_duplicates('r_grp'),how='left',on='r_grp')
grid_pts.rename(columns={'b':'r_b', 'm': 'r_m'}, inplace=True)

# Calculate x, y coordinates of gridline interception points

grid_pts['x'] = (grid_pts['r_b']-grid_pts['o_b']) \
              / (grid_pts['o_m']-grid_pts['r_m'])
grid_pts['y'] = grid_pts['o_m'] * grid_pts['x'] + grid_pts['o_b']

# Results output

print(grid_lines)
print(grid_pts)

plt.scatter(grid_pts['x'], grid_pts['y'], s=8, c='b') # for setting axes

axes = plt.gca()
axes.invert_yaxis()
axes.xaxis.tick_top()
axes.set_aspect('equal')
axes.set_xlim(axes.get_xlim())
axes.set_ylim(axes.get_ylim())

x_vals = np.array(axes.get_xlim())
for idx in grid_lines.index:
    y_vals = grid_lines['b'][idx] + grid_lines['m'][idx] * x_vals
    plt.plot(x_vals, y_vals, c='gray')

plt.show()

相关问题