When you make an API call using Hazelcast IMDG, an operation has been started on one of the Hazelcast cluster members. These operations send heartbeats to the invocation owner (caller) periodically. If the invocation owner does not receive any heartbeats from the pending invocation for the configured timeout duration ("hazelcast.operation.call.timeout.millis"), then it throws an OperationTimeoutException.
Here is a sample OperationTimeoutException output. Let's look at the details of it:
com.hazelcast.core.OperationTimeoutException: SizeOperation invocation failed to complete due to operation-heartbeat-timeout. Current time: 2017-02-20 03:02:09.079. Total elapsed time: 120595 ms. Last operation heartbeat: never. Last operation heartbeat from member: 2017-02-20 03:02:04.307. Invocation{op=com.hazelcast.collection.impl.queue.operations.SizeOperation{serviceName='hz:impl:queueService', identityHash=1210897035, partitionId=198, replicaIndex=0, callId=0, invocationTime=1487559608483 (2017-02-20 03:00:08.483), waitTimeout=-1, callTimeout=60000}, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeoutMillis=60000, firstInvocationTimeMs=1487559608484, firstInvocationTime='2017-02-20 03:00:08.484', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[192.168.110.11]:5703, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=6, /192.168.110.10:5703->/192.168.110.11:59297, endpoint=[192.168.110.11]:5703, alive=true, type=MEMBER]}
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.newOperationTimeoutException(InvocationFuture.java:150)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:98)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveAndThrow(InvocationFuture.java:74)
at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:158)
at com.hazelcast.collection.impl.queue.QueueProxySupport.invokeAndGet(QueueProxySupport.java:177)
at com.hazelcast.collection.impl.queue.QueueProxySupport.invokeAndGet(QueueProxySupport.java:170)
at com.hazelcast.collection.impl.queue.QueueProxySupport.size(QueueProxySupport.java:104)
at com.hazelcast.collection.impl.queue.QueueProxyImpl.size(QueueProxyImpl.java:40)
We'll break it down piece by piece:
- "SizeOperation invocation failed to complete due to operation-heartbeat-timeout."
- This means that invocation owner (caller) made a SizeOperation call and it failed because of the operation-heartbeat-timeout.
- "Current time: 2017-02-20 03:02:09.079."
- Current time that the exception is thrown.
- "Total elapsed time: 120595 ms."
- The total elapsed time since the operation was started.
- "Last operation heartbeat: never."
- The last time that the heartbeat from the related operation has been received. In this example, the operation has never sent a heartbeat.
- "Last operation heartbeat from member: 2017-02-20 03:02:04.307"
- The last time that the heartbeat from the member which is running the operation has been received.
- "Invocation{op=com.hazelcast.collection.impl.queue.operations.SizeOperation{serviceName='hz:impl:queueService', identityHash=1210897035, partitionId=198, replicaIndex=0, callId=0, invocationTime=1487559608483 (2017-02-20 03:00:08.483), waitTimeout=-1, callTimeout=60000}, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeoutMillis=60000, firstInvocationTimeMs=1487559608484, firstInvocationTime='2017-02-20 03:00:08.484', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[192.168.110.11]:5703, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=6, /192.168.110.10:5703->/192.168.110.11:59297, endpoint=[192.168.110.11]:5703, alive=true, type=MEMBER]}"
- Other invocation details.
In this example, the operation has never sent a heartbeat but the member has sent one just before the exception is thrown. This means that the member is alive, but the operation has never executed. It may be still in the operation thread queue, or the thread may be stalled for some other reason such as GC pause or loading the data from a persistent store.
Comments
0 comments
Article is closed for comments.