我正在运行openstack queens
在openstack ansible部署上,最近我发现nova&neutron等多个组件开始在日志中抛出以下错误堆栈,一切正常,但这个错误让我担心,有人知道这方面的任何事情吗?
我已经检查了基本的东西,像F5LB,mysql,网络连接都不错。
2018-08-13 10:36:23.552 17533 ERROR oslo_db.sqlalchemy.engines "MySQL server has gone away (%r)" % (e,))
2018-08-13 10:36:23.552 17533 ERROR oslo_db.sqlalchemy.engines DBConnectionError: (pymysql.err.OperationalError) (2006, "MySQL server has gone away (error(104, 'Connection reset by peer'))") [SQL: u'SELECT 1'] (Background on this error at: http://sqlalche.me/e/e3q8)
2018-08-13 10:36:34.997 17538 ERROR oslo_db.sqlalchemy.engines [req-c9a4ea4f-3577-42c7-aed3-e34416d93c1a 34205a21a4e4430b8be896c6a6b692cb 2b1447ec414b4751965f75785cab6468 - default default] Database connection was found disconnected; reconnecting: DBConnectionError: (pymysql.err.OperationalError) (2006, "MySQL server has gone away (error(104, 'Connection reset by peer'))") [SQL: u'SELECT 1'] (Background on this error at: http://sqlalche.me/e/e3q8)
2018-08-13 10:36:34.997 17538 ERROR oslo_db.sqlalchemy.engines "MySQL server has gone away (%r)" % (e,))
2018-08-13 10:36:34.997 17538 ERROR oslo_db.sqlalchemy.engines DBConnectionError: (pymysql.err.OperationalError) (2006, "MySQL server has gone away (error(104, 'Connection reset by peer'))") [SQL: u'SELECT 1'] (Background on this error at: http://sqlalche.me/e/e3q8)
2018-08-13 10:38:23.231 17529 ERROR oslo_db.sqlalchemy.engines [req-ba38ec9d-ee4e-4974-933b-46c8133397c1 34205a21a4e4430b8be896c6a6b692cb 2b1447ec414b4751965f75785cab6468 - default default] Database connection was found disconnected; reconnecting: DBConnectionError: (pymysql.err.OperationalError) (2006, "MySQL server has gone away (error(104, 'Connection reset by peer'))") [SQL: u'SELECT 1'] (Background on this error at: http://sqlalche.me/e/e3q8)
在这里您可以看到更多日志:http://paste.openstack.org/show/728277/
这是我的3节点galera群集设置。
[client]
port = 3306
socket = "/var/lib/mysql/mysql.sock"
[mysqld_safe]
socket = "/var/lib/mysql/mysql.sock"
nice = 0
log_error = /var/log/mysql_logs/galera_server_error.log
[mysql]
default-character-set = utf8
[mysqld]
user = mysql
collation-server = utf8_general_ci
init-connect = 'SET NAMES utf8'
character-set-server = utf8
datadir = /var/lib/mysql
bind-address = ::
server-id = 200
log-queries-not-using-indexes = 0
slow-query-log = 0
slow-query-log-file = /var/log/mysql_logs/mysql-slow.log
log_error = /var/log/mysql_logs/galera_server_error.log
log-bin = /var/lib/mysql/mariadb-bin
log-bin-index = /var/lib/mysql/mariadb-bin.index
expire-logs-days = 7
log_slave_updates = 1
log_bin_trust_function_creators = 1
max-allowed-packet = 16M
max-connect-errors = 1000000
max_connections = 1600
wait_timeout = 3600
tmp-table-size = 32M
max-heap-table-size = 32M
query-cache-type = 0
query-cache-size = 0M
thread-cache-size = 50
open-files-limit = 65535
table-definition-cache = 4096
table-open-cache = 10240
innodb-flush-method = O_DIRECT
innodb-log-file-size = 1024M
innodb-flush-log-at-trx-commit = 1
innodb-file-per-table = 1
innodb-buffer-pool-size = 4096M
innodb-read-io-threads = 4
innodb-write-io-threads = 4
innodb-doublewrite = 1
innodb-log-buffer-size = 128M
innodb-buffer-pool-instances = 8
innodb-log-files-in-group = 2
innodb-thread-concurrency = 64
innodb_stats_on_metadata = 0
[mysqldump]
quick
quote-names
max_allowed_packet = 16M
!includedir /etc/mysql/conf.d/
1条答案
按热度按时间bbmckpt71#
有很多原因,你可以得到“服务器已经消失”的消息。太多无法粘贴。请仔细阅读以下内容,看看是否适用于您:https://dev.mysql.com/doc/refman/5.6/en/gone-away.html
当我过去遇到这些情况时,
在应用服务器和数据库之间有一个防火墙,用于关闭tcp会话
连接池中存在过时的连接
管理员已终止服务器上的连接,应用程序在再次使用该连接之前不会报告它。
如果使用连接池,请确保连接生存期(或超时等)小于配置的时间
wait_timeout
.因为失败的查询是
SELECT 1
我假设连接池已启用连接验证。这将用一个简单的查询检查连接,如果失败,它将使用池中的新连接,然后重试。这看起来像是处理丢失连接的正常操作。编辑
因为您发现f5有一个很短的超时时间,所以对于数据库连接来说,这个超时时间可以增加到更实际的时间。我在任何地方都见过1到8个小时,这取决于它是应用程序连接,还是来自桌面应用程序。
为了强制您的客户机刷新它的连接,使用sqlalchemy,您似乎希望添加pool\u recycle,以便在f5超时之前回收连接。在openstack中定义数据源的任何地方,都需要为sqlalchemy docs.sqlalchemy.org/en/latest/core/engines.html添加更多配置选项。
不过,我只想将f5/haproxy更新为1小时,看看这些错误发生的频率。