mysql 具有相同索引的两个类似SPATIAL列的工作方式不同

ui7jx7zq  于 2023-10-15  发布在  Mysql
关注(0)|答案(1)|浏览(96)

我正在简化MYSQL数据库(v8.0)中的邮政编码多边形,我正在减少每个多边形的坐标数量。
因此,我有一个名为zip_city的表,其中包含一个名为boundary的列,这是原始的多多边形列,我创建了另一个简化多边形boundary_simplified。它们都有SRID 4326(我已经包含了is_point列,因为它可能很重要):

+---------------------+--------------------------------+------+-----+---------+----------------+
| Field               | Type                           | Null | Key | Default | Extra          |
+---------------------+--------------------------------+------+-----+---------+----------------+

| boundary            | multipolygon                   | NO   | MUL | NULL    |                |
| is_point            | tinyint unsigned               | NO   | MUL | 0       |                |
| boundary_simplified | multipolygon                   | NO   | MUL | NULL    |                |
+---------------------+--------------------------------+------+-----+---------+----------------+

运行一个显示索引,我有这个:

mysql> SHOW INDEXES FROM zip_city;
+----------+------------+---------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table    | Non_unique | Key_name                  | Seq_in_index | Column_name         | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+----------+------------+---------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| zip_city |          1 | idx_is_point              |            1 | is_point            | A         |           2 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| zip_city |          1 | boundary                  |            1 | boundary            | A         |       34287 |       32 |   NULL |      | SPATIAL    |         |               | YES     | NULL       |
| zip_city |          1 | boundary_simplified       |            1 | boundary_simplified | A         |       34287 |       32 |   NULL |      | SPATIAL    |         |               | YES     | NULL       |
+----------+------------+---------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+

这看起来完全相同,但是当我尝试使用st_contains运行查询时,它对它们的工作方式并不相同,例如:

mysql> SELECT zip      FROM zip_city
      WHERE
          ST_CONTAINS(boundary, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
 AND          is_point = 0      LIMIT 1;
+-------+
| zip   |
+-------+
| 99901 |
+-------+
1 row in set (0.03 sec)
mysql> SELECT zip      FROM zip_city
      WHERE
          ST_CONTAINS(boundary_simplified, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
 AND
         is_point = 0      LIMIT 1;
+-------+
| zip   |
+-------+
| 99901 |
+-------+
1 row in set (4.84 sec)

当我解释这两个查询时,我看到使用boundary_simplified的查询没有使用索引:

mysql> EXPLAIN SELECT zip      FROM zip_city
      WHERE
          ST_CONTAINS(boundary, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
 AND
      is_point = 0      LIMIT 1;
+----+-------------+----------+------------+-------+-----------------------+----------+---------+------+------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys         | key      | key_len | ref  | rows | filtered | Extra       |
+----+-------------+----------+------------+-------+-----------------------+----------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | zip_city | NULL       | range | idx_is_point,boundary | boundary | 34      | NULL |    1 |    50.00 | Using where |
+----+-------------+----------+------------+-------+-----------------------+----------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT zip      FROM zip_city
      WHERE
          ST_CONTAINS(boundary_simplified, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}')) 
+----+-------------+----------+------------+------+---------------+--------------+---------+-------+-------+----------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key          | key_len | ref   | rows  | filtered | Extra       |
+----+-------------+----------+------------+------+---------------+--------------+---------+-------+-------+----------+-------------+
|  1 | SIMPLE      | zip_city | NULL       | ref  | idx_is_point  | idx_is_point | 1       | const | 17143 |   100.00 | Using where |
+----+-------------+----------+------------+------+---------------+--------------+---------+-------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

有什么线索吗?我觉得我错过了一些简单的东西,但我找不到有关这方面的信息。另外,在创建索引时,boundary列需要大约23.25秒,而boundary_simplified只需要大约0.75秒(这很奇怪。坐标是否影响索引的效率?)
我试过删除两个索引并分别创建它们,我测试了索引的行为,当然,我试过在查询中使用FORCE INDEX或USE INDEX,这导致了相同/更糟糕的行为。
编辑:我修正了显示的索引,感谢user 1191247的观察。另外,我没有显示完整的表信息,因为它是无用的。

z9smfwbn

z9smfwbn1#

感谢user 1191247的评论,我找了他问的信息,我找到了这个:

| zip_city | CREATE TABLE `zip_city` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `state_id` int unsigned NOT NULL,
  `zip` mediumint(5) unsigned zerofill NOT NULL,
  `city` varchar(64) NOT NULL,
  `slug` varchar(64) NOT NULL,
  `location` point NOT NULL /*!80003 SRID 4326 */,
  `boundary` multipolygon NOT NULL /*!80003 SRID 4326 */,
  `is_point` tinyint unsigned NOT NULL DEFAULT '0',
  `fit_market` tinyint unsigned NOT NULL DEFAULT '0',
  `boundary_simplified` multipolygon NOT NULL,
  PRIMARY KEY (`id`),
  KEY `fk_zip_to_city_state1_idx` (`state_id`),
  KEY `idx_zip` (`zip`),
  KEY `idx_slug` (`slug`),
  KEY `idx_city` (`city`),
  SPATIAL KEY `idx_location` (`location`),
  SPATIAL KEY `boundary` (`boundary`),
  SPATIAL KEY `boundary_simplified` (`boundary_simplified`),
  CONSTRAINT `fk_zip_to_city_state1` FOREIGN KEY (`state_id`) REFERENCES `state` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=41381 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |

其中,正如您所看到的,boundary_simplified缺少SRID定义,这对于索引正常工作至关重要(对于SELECT DISTINCT ST_SRID(boundary_simplified) FROM zip_city;,我获得了SRID 4326,所以我不认为这是问题所在,但它在列定义中缺失)。我通过运行这些查询解决了这个问题:

DROP INDEX boundary_simplified ON zip_city;

ALTER TABLE zip_city MODIFY COLUMN boundary_simplified MULTIPOLYGON NOT NULL SRID 4326;

(took约53秒)

ALTER TABLE zip_city ADD SPATIAL INDEX idx_boundary_simplified (boundary_simplified);

(now它花了~24秒,这已经是好消息)
然后索引工作得很好:)

相关问题