我们有两个系统一个是2节点的qa系统,一个是6节点的prod系统。
qa系统启动得非常好。我们有一套工作制度,所以我们晋升为生产部门。
prod系统启动并在大约16秒后抛出这些错误,而ignite缓存都不起作用。
其中2个节点启动,其他4个节点无法启动。
在其中一个未启动的节点上:
点火信息来自:
2020-11-24 18:30:52 INFO [] stdout:71 - [18:30:52] __________ ________________
2020-11-24 18:30:52 INFO [] stdout:71 - [18:30:52] / _/ ___/ |/ / _/_ __/ __/
2020-11-24 18:30:52 INFO [] stdout:71 - [18:30:52] _/ // (7 7 // / / / / _/
2020-11-24 18:30:52 INFO [] stdout:71 - [18:30:52] /___/\___/_/|_/___/ /_/ /___/
在 2020-11-24 18:42:09
我们得到以下错误(清除的数据):
2020-11-24 18:42:09 INFO [] GridTcpRestProtocol:285 - Command protocol successfully stopped: TCP binary
2020-11-24 18:42:09 INFO [] GridDhtPartitionsExchangeFuture:285 - Finish exchange future [startVer=AffinityTopologyVersion [topVer=8, minorTopVer=0], resVer=null, err=class org.apache.ignite.internal.NodeStoppingException: Node is stopping: null, rebalanced=false, wasRebalanced=false]
2020-11-24 18:42:09 INFO [] GridDhtPartitionsExchangeFuture:285 - Completed partition exchange [localNode=4a0b2901-adc1-4416-8345-82caa6a18cea, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=8, minorTopVer=0], evt=NODE_LEFT, evtNode=TcpDiscoveryNode [id=7a62d367-a907-43c2-90b4-53d15ec30a91, consistentId=10.10.232.6,127.0.0.1,152.16.11.67:47500, addrs=ArrayList [10.10.232.6, 127.0.0.1, 152.16.11.67], sockAddrs=HashSet [/127.0.0.1:47500, mzitsme1-nick-p1.myarea.example.com/10.10.232.6:47500, itsme1-nick-p1.myarea.example.com/152.16.11.67:47500], discPort=47500, order=5, intOrder=5, lastExchangeTime=1606264503158, loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false], done=true, newCrdFut=null], topVer=null]
2020-11-24 18:42:09 WARNING [] GridDhtAtomicCache:295 - <MY_CACHE> Failed to update key on backup (local node is stopping): KeyCacheObjectImpl [part=377, val=com.example.MyCache, hasValBytes=true]
2020-11-24 18:42:09 WARNING [] GridDhtAtomicCache:295 - <MY_CACHE> Failed to update key on backup (local node is stopping): KeyCacheObjectImpl [part=377, val=com.example.MyCache, hasValBytes=true]
2020-11-24 18:42:09 WARNING [] GridDhtAtomicCache:295 - <MY_CACHE> Failed to update key on backup (local node is stopping): KeyCacheObjectImpl [part=377, val=com.example.MyCache, hasValBytes=true]
2020-11-24 18:42:09 SEVERE [] GridDhtAtomicCache:310 - <MYCACHE2> Unexpected exception during cache update: class org.apache.ignite.IgniteException: Failed to resolve nodes topology [cacheGrp=CACHE_MY_CACHE, topVer=AffinityTopologyVersion [topVer=9, minorTopVer=0], history=[AffinityTopologyVersion [topVer=4, minorTopVer=0], AffinityTopologyVersion [topVer=5, minorTopVer=0], AffinityTopologyVersion [topVer=6, minorTopVer=0], AffinityTopologyVersion [topVer=7, minorTopVer=0], AffinityTopologyVersion [topVer=8, minorTopVer=0]], snap=Snapshot [topVer=AffinityTopologyVersion [topVer=8, minorTopVer=0]], locNode=TcpDiscoveryNode [id=4a0b2901-adc1-4416-8345-82caa6a18cea, consistentId=10.10.232.14,127.0.0.1,152.16.11.75:47500, addrs=ArrayList [10.10.232.14, 127.0.0.1, 152.16.11.75], sockAddrs=HashSet [/127.0.0.1:47500, mzitsme4-nick.myarea.example.com/10.10.232.14:47500, itsme4-nick.myarea.example.com/152.16.11.75:47500], discPort=47500, order=4, intOrder=4, lastExchangeTime=1606264871091, loc=true, ver=2.8.1#20200521-sha1:86422096, isClient=false]]
at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.resolveDiscoCache(GridDiscoveryManager.java:1999)
at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.cacheGroupAffinityNodes(GridDiscoveryManager.java:1881)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.needRemap(GridDhtCacheAdapter.java:1297)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1850)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1719)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3306)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:141)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:273)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:268)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1$2$1.run(GridCacheIoManager.java:288)
at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:565)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
下面是我的缓存配置代码。所有属性的默认值都是我们当前使用的属性。
@PostConstruct
public void init() {
if (!CACHING_ENABLED) {
LOGGER.warn("Caching is currently disabled because {} is not set to Y in the properties files!!!", Constants.PROPERTY_CACHING_ENABLED);
return;
}
try {
System.setProperty("IGNITE_UPDATE_NOTIFIER", "false");
igniteConfiguration = new IgniteConfiguration();
int failureDetectionTimeout = Integer.parseInt(getProperty("IGNITE_TCP_DISCOVERY_FAILURE_DETECTION_TIMEOUT", "60000"));
igniteConfiguration.setFailureDetectionTimeout(failureDetectionTimeout);
String igniteCacheStorageDirectory = getProperty("IGNITE_CACHE_STORAGE_DIRECTORY");
if (StringUtils.isNotBlank(igniteCacheStorageDirectory)) {
DataStorageConfiguration dsCfg = new DataStorageConfiguration();
DataRegionConfiguration dfltDataRegConf = new DataRegionConfiguration();
dfltDataRegConf.setPersistenceEnabled(true);
dsCfg.setDefaultDataRegionConfiguration(dfltDataRegConf);
dsCfg.setStoragePath(igniteCacheStorageDirectory);
igniteConfiguration.setDataStorageConfiguration(dsCfg);
}
String igniteVmIps = getProperty("IGNITE_VM_IPS");
List<String> addresses = Arrays.asList("127.0.0.1:47500");
if (StringUtils.isNotBlank(igniteVmIps)) {
addresses = Arrays.asList(igniteVmIps.split(","));
}
int networkTimeout = Integer.parseInt(getProperty("IGNITE_TCP_DISCOVERY_NETWORK_TIMEOUT", "60000"));
boolean failureDetectionTimeoutEnabled = Boolean.parseBoolean(getProperty("IGNITE_TCP_DISCOVERY_FAILURE_DETECTION_TIMEOUT_ENABLED", "true"));
int tcpDiscoveryLocalPort = Integer.parseInt(getProperty("IGNITE_TCP_DISCOVERY_LOCAL_PORT", "47500"));
int tcpDiscoveryLocalPortRange = Integer.parseInt(getProperty("IGNITE_TCP_DISCOVERY_LOCAL_PORT_RANGE", "0"));
TcpDiscoverySpi tcpDiscoverySpi = new TcpDiscoverySpi();
tcpDiscoverySpi.setLocalPort(tcpDiscoveryLocalPort);
tcpDiscoverySpi.setLocalPortRange(tcpDiscoveryLocalPortRange);
tcpDiscoverySpi.setNetworkTimeout(networkTimeout);
tcpDiscoverySpi.failureDetectionTimeoutEnabled(failureDetectionTimeoutEnabled);
TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
ipFinder.setAddresses(addresses);
tcpDiscoverySpi.setIpFinder(ipFinder);
int messageQueueLimit = Integer.parseInt(getProperty("IGNITE_TCP_DISCOVERY_MESSAGE_QUEUE_LIMIT", "1000"));
TcpCommunicationSpi tcpCommunicationSpi = new TcpCommunicationSpi();
tcpCommunicationSpi.setMessageQueueLimit(messageQueueLimit);
igniteConfiguration.setDiscoverySpi(tcpDiscoverySpi);
igniteConfiguration.setCommunicationSpi(tcpCommunicationSpi);
isInit = true;
} catch (Exception e) {
LOGGER.error("Could not initialize cache! Cache services will be unavailable!", e);
isInit = false;
}
}
很遗憾,我无法共享完整日志。我有什么建议或窍门可以让这个错误平息下来吗?
我看到有人提到将ack timeout设置为更高的值。否则,论坛就没有提供很多关于如何在这里做的提示。
1条答案
按热度按时间edqdpe6u1#
好的,我想我们解决了这个问题。请注意上面的tcp发现过程中是如何找到多个nic的。这是因为我的jboss服务器有两个网络接口,一个用于我的局域网
10.10.232.6
另一个是dmz152.16.11.67
. 但是我的集群中的节点只能通过我的局域网ip相互通信。我的解决办法是打电话
igniteConfiguration.setLocalHost(InetAddress.getLocalHost().getAddress());
而不是绑定到0.0.0.0
将绑定到lan ip10.10.232.6
. 这阻止ignite discovery尝试使用dmz nic。