apache ignite不会以6节点群集启动-无法解析节点拓扑

ffvjumwh  于 2021-07-06  发布在  Java
关注(0)|答案(1)|浏览(331)

我们有两个系统一个是2节点的qa系统,一个是6节点的prod系统。
qa系统启动得非常好。我们有一套工作制度,所以我们晋升为生产部门。
prod系统启动并在大约16秒后抛出这些错误,而ignite缓存都不起作用。
其中2个节点启动,其他4个节点无法启动。
在其中一个未启动的节点上:
点火信息来自:

2020-11-24 18:30:52 INFO  [] stdout:71 - [18:30:52]    __________  ________________ 
2020-11-24 18:30:52 INFO  [] stdout:71 - [18:30:52]   /  _/ ___/ |/ /  _/_  __/ __/ 
2020-11-24 18:30:52 INFO  [] stdout:71 - [18:30:52]  _/ // (7 7    // /  / / / _/   
2020-11-24 18:30:52 INFO  [] stdout:71 - [18:30:52] /___/\___/_/|_/___/ /_/ /___/

2020-11-24 18:42:09 我们得到以下错误(清除的数据):

2020-11-24 18:42:09 INFO  [] GridTcpRestProtocol:285 - Command protocol successfully stopped: TCP binary
2020-11-24 18:42:09 INFO  [] GridDhtPartitionsExchangeFuture:285 - Finish exchange future [startVer=AffinityTopologyVersion [topVer=8, minorTopVer=0], resVer=null, err=class org.apache.ignite.internal.NodeStoppingException: Node is stopping: null, rebalanced=false, wasRebalanced=false]
2020-11-24 18:42:09 INFO  [] GridDhtPartitionsExchangeFuture:285 - Completed partition exchange [localNode=4a0b2901-adc1-4416-8345-82caa6a18cea, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=8, minorTopVer=0], evt=NODE_LEFT, evtNode=TcpDiscoveryNode [id=7a62d367-a907-43c2-90b4-53d15ec30a91, consistentId=10.10.232.6,127.0.0.1,152.16.11.67:47500, addrs=ArrayList [10.10.232.6, 127.0.0.1, 152.16.11.67], sockAddrs=HashSet [/127.0.0.1:47500, mzitsme1-nick-p1.myarea.example.com/10.10.232.6:47500, itsme1-nick-p1.myarea.example.com/152.16.11.67:47500], discPort=47500, order=5, intOrder=5, lastExchangeTime=1606264503158, loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false], done=true, newCrdFut=null], topVer=null]
2020-11-24 18:42:09 WARNING [] GridDhtAtomicCache:295 - <MY_CACHE> Failed to update key on backup (local node is stopping): KeyCacheObjectImpl [part=377, val=com.example.MyCache, hasValBytes=true]
2020-11-24 18:42:09 WARNING [] GridDhtAtomicCache:295 - <MY_CACHE> Failed to update key on backup (local node is stopping): KeyCacheObjectImpl [part=377, val=com.example.MyCache, hasValBytes=true]
2020-11-24 18:42:09 WARNING [] GridDhtAtomicCache:295 - <MY_CACHE> Failed to update key on backup (local node is stopping): KeyCacheObjectImpl [part=377, val=com.example.MyCache, hasValBytes=true]
2020-11-24 18:42:09 SEVERE [] GridDhtAtomicCache:310 - <MYCACHE2> Unexpected exception during cache update: class org.apache.ignite.IgniteException: Failed to resolve nodes topology [cacheGrp=CACHE_MY_CACHE, topVer=AffinityTopologyVersion [topVer=9, minorTopVer=0], history=[AffinityTopologyVersion [topVer=4, minorTopVer=0], AffinityTopologyVersion [topVer=5, minorTopVer=0], AffinityTopologyVersion [topVer=6, minorTopVer=0], AffinityTopologyVersion [topVer=7, minorTopVer=0], AffinityTopologyVersion [topVer=8, minorTopVer=0]], snap=Snapshot [topVer=AffinityTopologyVersion [topVer=8, minorTopVer=0]], locNode=TcpDiscoveryNode [id=4a0b2901-adc1-4416-8345-82caa6a18cea, consistentId=10.10.232.14,127.0.0.1,152.16.11.75:47500, addrs=ArrayList [10.10.232.14, 127.0.0.1, 152.16.11.75], sockAddrs=HashSet [/127.0.0.1:47500, mzitsme4-nick.myarea.example.com/10.10.232.14:47500, itsme4-nick.myarea.example.com/152.16.11.75:47500], discPort=47500, order=4, intOrder=4, lastExchangeTime=1606264871091, loc=true, ver=2.8.1#20200521-sha1:86422096, isClient=false]]
    at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.resolveDiscoCache(GridDiscoveryManager.java:1999)
    at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.cacheGroupAffinityNodes(GridDiscoveryManager.java:1881)
    at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.needRemap(GridDhtCacheAdapter.java:1297)
    at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1850)
    at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1719)
    at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3306)
    at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:141)
    at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:273)
    at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:268)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1$2$1.run(GridCacheIoManager.java:288)
    at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:565)
    at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
    at java.lang.Thread.run(Thread.java:748)

下面是我的缓存配置代码。所有属性的默认值都是我们当前使用的属性。

@PostConstruct
    public void init() {
        if (!CACHING_ENABLED) {
            LOGGER.warn("Caching is currently disabled because {} is not set to Y in the properties files!!!", Constants.PROPERTY_CACHING_ENABLED);
            return;
        }
        try {
            System.setProperty("IGNITE_UPDATE_NOTIFIER", "false");

            igniteConfiguration = new IgniteConfiguration();

            int failureDetectionTimeout = Integer.parseInt(getProperty("IGNITE_TCP_DISCOVERY_FAILURE_DETECTION_TIMEOUT", "60000"));

            igniteConfiguration.setFailureDetectionTimeout(failureDetectionTimeout);
            String igniteCacheStorageDirectory = getProperty("IGNITE_CACHE_STORAGE_DIRECTORY");
            if (StringUtils.isNotBlank(igniteCacheStorageDirectory)) {
                DataStorageConfiguration dsCfg = new DataStorageConfiguration();
                DataRegionConfiguration dfltDataRegConf = new DataRegionConfiguration();
                dfltDataRegConf.setPersistenceEnabled(true);
                dsCfg.setDefaultDataRegionConfiguration(dfltDataRegConf);
                dsCfg.setStoragePath(igniteCacheStorageDirectory);
                igniteConfiguration.setDataStorageConfiguration(dsCfg); 
            }

            String igniteVmIps = getProperty("IGNITE_VM_IPS");
            List<String> addresses = Arrays.asList("127.0.0.1:47500");
            if (StringUtils.isNotBlank(igniteVmIps)) {
                addresses = Arrays.asList(igniteVmIps.split(","));
            }

            int networkTimeout = Integer.parseInt(getProperty("IGNITE_TCP_DISCOVERY_NETWORK_TIMEOUT", "60000"));
            boolean failureDetectionTimeoutEnabled = Boolean.parseBoolean(getProperty("IGNITE_TCP_DISCOVERY_FAILURE_DETECTION_TIMEOUT_ENABLED", "true"));

            int tcpDiscoveryLocalPort = Integer.parseInt(getProperty("IGNITE_TCP_DISCOVERY_LOCAL_PORT", "47500"));
            int tcpDiscoveryLocalPortRange = Integer.parseInt(getProperty("IGNITE_TCP_DISCOVERY_LOCAL_PORT_RANGE", "0"));

            TcpDiscoverySpi tcpDiscoverySpi = new TcpDiscoverySpi();
            tcpDiscoverySpi.setLocalPort(tcpDiscoveryLocalPort);
            tcpDiscoverySpi.setLocalPortRange(tcpDiscoveryLocalPortRange);
            tcpDiscoverySpi.setNetworkTimeout(networkTimeout);
            tcpDiscoverySpi.failureDetectionTimeoutEnabled(failureDetectionTimeoutEnabled);
            TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
            ipFinder.setAddresses(addresses);
            tcpDiscoverySpi.setIpFinder(ipFinder);

            int messageQueueLimit = Integer.parseInt(getProperty("IGNITE_TCP_DISCOVERY_MESSAGE_QUEUE_LIMIT", "1000"));

            TcpCommunicationSpi tcpCommunicationSpi = new TcpCommunicationSpi();
            tcpCommunicationSpi.setMessageQueueLimit(messageQueueLimit);

            igniteConfiguration.setDiscoverySpi(tcpDiscoverySpi);
            igniteConfiguration.setCommunicationSpi(tcpCommunicationSpi);
            isInit = true;
        } catch (Exception e) {
            LOGGER.error("Could not initialize cache! Cache services will be unavailable!", e);
            isInit = false;
        }
    }

很遗憾,我无法共享完整日志。我有什么建议或窍门可以让这个错误平息下来吗?
我看到有人提到将ack timeout设置为更高的值。否则,论坛就没有提供很多关于如何在这里做的提示。

edqdpe6u

edqdpe6u1#

好的,我想我们解决了这个问题。请注意上面的tcp发现过程中是如何找到多个nic的。这是因为我的jboss服务器有两个网络接口,一个用于我的局域网 10.10.232.6 另一个是dmz 152.16.11.67 . 但是我的集群中的节点只能通过我的局域网ip相互通信。
我的解决办法是打电话 igniteConfiguration.setLocalHost(InetAddress.getLocalHost().getAddress()); 而不是绑定到 0.0.0.0 将绑定到lan ip 10.10.232.6 . 这阻止ignite discovery尝试使用dmz nic。

相关问题