-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MINOR] Improve consumer logging on LeaveGroup #5420
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment, otherwise lgtm.
@@ -1031,6 +1031,7 @@ public void run() { | |||
} else if (heartbeat.pollTimeoutExpired(now)) { | |||
// the poll timeout has expired, which means that the foreground thread has stalled | |||
// in between calls to poll(), so we explicitly leave the group. | |||
log.info("Poll timeout expired"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a bit misleading: how about Heartbeat poll timeout has expired; it means the caller thread has been stalled too long, will explicitly leave the group to trigger a rebalance
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "consumer poll timeout" would be clearer than "heartbeat poll timeout"? The problem is the delay between calls to Consumer.poll
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we consider a detailed log message like we do for commit failed exception? Also, is this just an info
or should it be a warn
?
"Commit cannot be completed since the group has already " +
"rebalanced and assigned the partitions to another member. This means that the time " +
"between subsequent calls to poll() was longer than the configured max.poll.interval.ms, " +
"which typically implies that the poll loop is spending too much time message processing. " +
"You can address this either by increasing the session timeout or by reducing the maximum " +
"size of batches returned in poll() with max.poll.records."
4a72e21
to
a4b7cd1
Compare
@@ -33,7 +33,7 @@ public CommitFailedException() { | |||
"rebalanced and assigned the partitions to another member. This means that the time " + | |||
"between subsequent calls to poll() was longer than the configured max.poll.interval.ms, " + | |||
"which typically implies that the poll loop is spending too much time message processing. " + | |||
"You can address this either by increasing the session timeout or by reducing the maximum " + | |||
"You can address this either by increasing max.poll.interval.ms or by reducing the maximum " + | |||
"size of batches returned in poll() with max.poll.records."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this description is completely accurate as the commit can fail when the coordinator has kicked us out of the group as well (for example, on session timeout). Though I am not sure if we have a great way of distinguishing between these cases based on the local consumer state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree. max.poll.interval.ms should be tuned more instead of session timeout given the heartbeat thread mechanism.
a4b7cd1
to
6dc9c0c
Compare
Very useful KIP! It is a shame we couldn't get this in 2.0.x - it is in Kafka 2.1. |
No description provided.