我正在简化MYSQL数据库(v8.0)中的邮政编码多边形,我正在减少每个多边形的坐标数量。
因此,我有一个名为zip_city
的表,其中包含一个名为boundary
的列,这是原始的多多边形列,我创建了另一个简化多边形boundary_simplified
。它们都有SRID 4326(我已经包含了is_point
列,因为它可能很重要):
+---------------------+--------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------------------------+------+-----+---------+----------------+
| boundary | multipolygon | NO | MUL | NULL | |
| is_point | tinyint unsigned | NO | MUL | 0 | |
| boundary_simplified | multipolygon | NO | MUL | NULL | |
+---------------------+--------------------------------+------+-----+---------+----------------+
运行一个显示索引,我有这个:
mysql> SHOW INDEXES FROM zip_city;
+----------+------------+---------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+----------+------------+---------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| zip_city | 1 | idx_is_point | 1 | is_point | A | 2 | NULL | NULL | | BTREE | | | YES | NULL |
| zip_city | 1 | boundary | 1 | boundary | A | 34287 | 32 | NULL | | SPATIAL | | | YES | NULL |
| zip_city | 1 | boundary_simplified | 1 | boundary_simplified | A | 34287 | 32 | NULL | | SPATIAL | | | YES | NULL |
+----------+------------+---------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
这看起来完全相同,但是当我尝试使用st_contains
运行查询时,它对它们的工作方式并不相同,例如:
mysql> SELECT zip FROM zip_city
WHERE
ST_CONTAINS(boundary, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
AND is_point = 0 LIMIT 1;
+-------+
| zip |
+-------+
| 99901 |
+-------+
1 row in set (0.03 sec)
mysql> SELECT zip FROM zip_city
WHERE
ST_CONTAINS(boundary_simplified, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
AND
is_point = 0 LIMIT 1;
+-------+
| zip |
+-------+
| 99901 |
+-------+
1 row in set (4.84 sec)
当我解释这两个查询时,我看到使用boundary_simplified的查询没有使用索引:
mysql> EXPLAIN SELECT zip FROM zip_city
WHERE
ST_CONTAINS(boundary, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
AND
is_point = 0 LIMIT 1;
+----+-------------+----------+------------+-------+-----------------------+----------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+-----------------------+----------+---------+------+------+----------+-------------+
| 1 | SIMPLE | zip_city | NULL | range | idx_is_point,boundary | boundary | 34 | NULL | 1 | 50.00 | Using where |
+----+-------------+----------+------------+-------+-----------------------+----------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT zip FROM zip_city
WHERE
ST_CONTAINS(boundary_simplified, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
+----+-------------+----------+------------+------+---------------+--------------+---------+-------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+--------------+---------+-------+-------+----------+-------------+
| 1 | SIMPLE | zip_city | NULL | ref | idx_is_point | idx_is_point | 1 | const | 17143 | 100.00 | Using where |
+----+-------------+----------+------------+------+---------------+--------------+---------+-------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
有什么线索吗?我觉得我错过了一些简单的东西,但我找不到有关这方面的信息。另外,在创建索引时,boundary
列需要大约23.25秒,而boundary_simplified
只需要大约0.75秒(这很奇怪。坐标是否影响索引的效率?)
我试过删除两个索引并分别创建它们,我测试了索引的行为,当然,我试过在查询中使用FORCE INDEX或USE INDEX,这导致了相同/更糟糕的行为。
编辑:我修正了显示的索引,感谢user 1191247的观察。另外,我没有显示完整的表信息,因为它是无用的。
1条答案
按热度按时间z9smfwbn1#
感谢user 1191247的评论,我找了他问的信息,我找到了这个:
其中,正如您所看到的,
boundary_simplified
缺少SRID定义,这对于索引正常工作至关重要(对于SELECT DISTINCT ST_SRID(boundary_simplified) FROM zip_city;
,我获得了SRID 4326,所以我不认为这是问题所在,但它在列定义中缺失)。我通过运行这些查询解决了这个问题:(took约53秒)
(now它花了~24秒,这已经是好消息)
然后索引工作得很好:)