A critical fix is available in Hazelcast 3.11.3 to address a resource leak that occurs in Hazelcast 3.11.1 and 3.11.2 during split-brain recovery. It is highly likely that this resource leak will lead to the Java Virtual Machine (JVM) exhausting its available heap space, resulting in an OutOfMemoryError (OOME). We recommend upgrading from 3.11.1 or 3.11.2 to 3.11.3 as soon as possible to take advantage of this fix. More details are below to help assess your risk for experiencing this issue.
- All users of Hazelcast 3.11.1 and 3.11.2 are at risk of encountering this issue.
- This issue is triggered when a split-brain and subsequent recovery occurs.
- Users who open and close client connections frequently are more susceptible to this resource leak resulting in an OOME.
- Users of the REST Client are especially susceptible, as each request creates and closes a new client connection.
Root Cause Analysis
As part of the split-brain recovery process we reset various Hazelcast subsystems including the Network I/O Manager. During the reset of the Network I/O Manager we shut down an ExecutorService used to clean up processes used to listen for, and clean up, Hazelcast Client connections to the member. This ExecutorService was not being restarted with the restart of the Network I/O Manager, resulting in the clean-up tasks used to remove these listeners not being run.
This issue was fixed in the Hazelcast 3.11 series in the 3.11.3 patch release on April 13th, 2019. We recommend any users of 3.11.1 or 3.11.2 upgrade to the 3.11.3 patch release as soon as possible.