-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-10847: Remove internal config for enabling the fix #10941
KAFKA-10847: Remove internal config for enabling the fix #10941
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guozhangwang , find a typo in upgrade doc. Thanks.
docs/streams/upgrade-guide.html
Outdated
Records coming in after the grace period has elapsed will be dropped from those windows. | ||
With a long grace period, though Kafka Streams can handle out-of-order data up to that amount of time, it will also incur a high and confusing latency for users, | ||
e.g. suppression operators with the default won't emit results up for 24 hours, while lso in practice out-of-order data usually has a much smaller time-skewness. | ||
Instead of abstracting this config from users with a long default value, we introduced new constructs such as <code>TimeWindows#ofSizeWithNoGrace</code> to let callers always set it upon constructing the windows; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're referring TimeWindows#ofSizeAndGrace
, so that user can set the grace value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also TimeDiffernce
, not Size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TimeDiffernce
is only in JoinWindows
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the old API, yes, but KIP-633 change it from size
to timeDifference
(https://cwiki.apache.org/confluence/display/KAFKA/KIP-633%3A+Drop+24+hour+default+of+grace+period+in+Streams)
docs/streams/upgrade-guide.html
Outdated
@@ -117,6 +117,21 @@ <h3><a id="streams_api_changes_300" href="#streams_api_changes_300">Streams API | |||
<p> | |||
We removed the default implementation of <code>RocksDBConfigSetter#close()</code>. | |||
</p> | |||
<p> | |||
We dropped the default 24 hours grace period for windowed operations such as Window or Session aggregates, or stream-stream joins. | |||
This period determines how long after a window ends any late arrived records will still be processed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
late
-> out-of-order
Record are "late" by definition if they are dropped because they arrive after the grace period passed.
docs/streams/upgrade-guide.html
Outdated
This period determines how long after a window ends any late arrived records will still be processed. | ||
Records coming in after the grace period has elapsed will be dropped from those windows. | ||
With a long grace period, though Kafka Streams can handle out-of-order data up to that amount of time, it will also incur a high and confusing latency for users, | ||
e.g. suppression operators with the default won't emit results up for 24 hours, while lso in practice out-of-order data usually has a much smaller time-skewness. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lso
? Should it be also
?
@@ -117,6 +117,21 @@ <h3><a id="streams_api_changes_300" href="#streams_api_changes_300">Streams API | |||
<p> | |||
We removed the default implementation of <code>RocksDBConfigSetter#close()</code>. | |||
</p> | |||
<p> | |||
We dropped the default 24 hours grace period for windowed operations such as Window or Session aggregates, or stream-stream joins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we include this in this PR? Seems it should be part of KIP-663 doc updates? \cc @izzyacademy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are piggy-backing the fix on KIP-663 now, I want to incorporate the change along with this PR.
docs/streams/upgrade-guide.html
Outdated
e.g. suppression operators with the default won't emit results up for 24 hours, while lso in practice out-of-order data usually has a much smaller time-skewness. | ||
Instead of abstracting this config from users with a long default value, we introduced new constructs such as <code>TimeWindows#ofSizeWithNoGrace</code> to let callers always set it upon constructing the windows; | ||
the other setters such as <code>TimeWindows#grace</code> are deprecated and will be removed in the future. | ||
Also when the new construct API are used for left/outer stream-stream joins, Kafka Streams would fix emitting spurious join results which may have an impact on the throughput. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would fix emitting spurious join results
-- not sure if users would understand this.
Maybe better:
In older versions, Kafka Streams emitted stream-stream left/outer join results eagerly.
This behavior may lead to spurious left/outer join result records.
In this release, we changed the behavior to avoid spurious results and
left/outer join result are only emitted after the join window is closed,
i.e., after the grace period elapsed. To maintain backward compatibility,
the old API <code>JoinWindows.of(size)</code> preserves the old eager-emit behavior
and only the new API `<code>JoinWindows.ofTimeDifferenceAndGrace()</code>
(and <code>JoinsWindows#ofTimeDifferenceNoGrace</code>) enable the new behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guozhangwang could you rebase with trunk and resubmit with the requested changes? It is currently in conflict and can't be merged as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed comments on docs, and rebased on trunk cc @mjsax @showuon @izzyacademy
@@ -117,6 +117,21 @@ <h3><a id="streams_api_changes_300" href="#streams_api_changes_300">Streams API | |||
<p> | |||
We removed the default implementation of <code>RocksDBConfigSetter#close()</code>. | |||
</p> | |||
<p> | |||
We dropped the default 24 hours grace period for windowed operations such as Window or Session aggregates, or stream-stream joins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are piggy-backing the fix on KIP-663 now, I want to incorporate the change along with this PR.
docs/streams/upgrade-guide.html
Outdated
Records coming in after the grace period has elapsed will be dropped from those windows. | ||
With a long grace period, though Kafka Streams can handle out-of-order data up to that amount of time, it will also incur a high and confusing latency for users, | ||
e.g. suppression operators with the default won't emit results up for 24 hours, while lso in practice out-of-order data usually has a much smaller time-skewness. | ||
Instead of abstracting this config from users with a long default value, we introduced new constructs such as <code>TimeWindows#ofSizeWithNoGrace</code> to let callers always set it upon constructing the windows; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TimeDiffernce
is only in JoinWindows
.
docs/streams/upgrade-guide.html
Outdated
e.g. suppression operators with the default won't emit results up for 24 hours, while lso in practice out-of-order data usually has a much smaller time-skewness. | ||
Instead of abstracting this config from users with a long default value, we introduced new constructs such as <code>TimeWindows#ofSizeWithNoGrace</code> to let callers always set it upon constructing the windows; | ||
the other setters such as <code>TimeWindows#grace</code> are deprecated and will be removed in the future. | ||
Also when the new construct API are used for left/outer stream-stream joins, Kafka Streams would fix emitting spurious join results which may have an impact on the throughput. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for updating the PR.
Also update the upgrade guide indicating about the grace period KIP and its indication on the fix with throughput impact. Reviewers: Luke Chen <[email protected]>, Matthias J. Sax <[email protected]>
Merged to trunk; cherry-picked to 3.0. |
Also update the upgrade guide indicating about the grace period KIP and its indication on the fix with throughput impact. Reviewers: Luke Chen <[email protected]>, Matthias J. Sax <[email protected]>
Also update the upgrade guide indicating about the grace period KIP and its indication on the fix with throughput impact.
Committer Checklist (excluded from commit message)