Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-10847: Remove internal config for enabling the fix #10941

Merged
merged 5 commits into from
Jul 15, 2021

Conversation

guozhangwang
Copy link
Contributor

Also update the upgrade guide indicating about the grace period KIP and its indication on the fix with throughput impact.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@guozhangwang
Copy link
Contributor Author

ping @mjsax @spena .

Copy link
Contributor

@showuon showuon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guozhangwang , find a typo in upgrade doc. Thanks.

Records coming in after the grace period has elapsed will be dropped from those windows.
With a long grace period, though Kafka Streams can handle out-of-order data up to that amount of time, it will also incur a high and confusing latency for users,
e.g. suppression operators with the default won't emit results up for 24 hours, while lso in practice out-of-order data usually has a much smaller time-skewness.
Instead of abstracting this config from users with a long default value, we introduced new constructs such as <code>TimeWindows#ofSizeWithNoGrace</code> to let callers always set it upon constructing the windows;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're referring TimeWindows#ofSizeAndGrace, so that user can set the grace value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also TimeDiffernce, not Size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TimeDiffernce is only in JoinWindows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the old API, yes, but KIP-633 change it from size to timeDifference (https://cwiki.apache.org/confluence/display/KAFKA/KIP-633%3A+Drop+24+hour+default+of+grace+period+in+Streams)

@@ -117,6 +117,21 @@ <h3><a id="streams_api_changes_300" href="#streams_api_changes_300">Streams API
<p>
We removed the default implementation of <code>RocksDBConfigSetter#close()</code>.
</p>
<p>
We dropped the default 24 hours grace period for windowed operations such as Window or Session aggregates, or stream-stream joins.
This period determines how long after a window ends any late arrived records will still be processed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

late -> out-of-order

Record are "late" by definition if they are dropped because they arrive after the grace period passed.

This period determines how long after a window ends any late arrived records will still be processed.
Records coming in after the grace period has elapsed will be dropped from those windows.
With a long grace period, though Kafka Streams can handle out-of-order data up to that amount of time, it will also incur a high and confusing latency for users,
e.g. suppression operators with the default won't emit results up for 24 hours, while lso in practice out-of-order data usually has a much smaller time-skewness.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lso ? Should it be also?

@@ -117,6 +117,21 @@ <h3><a id="streams_api_changes_300" href="#streams_api_changes_300">Streams API
<p>
We removed the default implementation of <code>RocksDBConfigSetter#close()</code>.
</p>
<p>
We dropped the default 24 hours grace period for windowed operations such as Window or Session aggregates, or stream-stream joins.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we include this in this PR? Seems it should be part of KIP-663 doc updates? \cc @izzyacademy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are piggy-backing the fix on KIP-663 now, I want to incorporate the change along with this PR.

e.g. suppression operators with the default won't emit results up for 24 hours, while lso in practice out-of-order data usually has a much smaller time-skewness.
Instead of abstracting this config from users with a long default value, we introduced new constructs such as <code>TimeWindows#ofSizeWithNoGrace</code> to let callers always set it upon constructing the windows;
the other setters such as <code>TimeWindows#grace</code> are deprecated and will be removed in the future.
Also when the new construct API are used for left/outer stream-stream joins, Kafka Streams would fix emitting spurious join results which may have an impact on the throughput.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would fix emitting spurious join results -- not sure if users would understand this.

Maybe better:

In older versions, Kafka Streams emitted stream-stream left/outer join results eagerly.
This behavior may lead to spurious left/outer join result records.
In this release, we changed the behavior to avoid spurious results and
left/outer join result are only emitted after the join window is closed,
i.e., after the grace period elapsed. To maintain backward compatibility,
the old API <code>JoinWindows.of(size)</code> preserves the old eager-emit behavior
and only the new API `<code>JoinWindows.ofTimeDifferenceAndGrace()</code>
(and <code>JoinsWindows#ofTimeDifferenceNoGrace</code>) enable the new behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack!

Copy link
Contributor

@izzyacademy izzyacademy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guozhangwang could you rebase with trunk and resubmit with the requested changes? It is currently in conflict and can't be merged as is.

Copy link
Contributor Author

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed comments on docs, and rebased on trunk cc @mjsax @showuon @izzyacademy

@@ -117,6 +117,21 @@ <h3><a id="streams_api_changes_300" href="#streams_api_changes_300">Streams API
<p>
We removed the default implementation of <code>RocksDBConfigSetter#close()</code>.
</p>
<p>
We dropped the default 24 hours grace period for windowed operations such as Window or Session aggregates, or stream-stream joins.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are piggy-backing the fix on KIP-663 now, I want to incorporate the change along with this PR.

Records coming in after the grace period has elapsed will be dropped from those windows.
With a long grace period, though Kafka Streams can handle out-of-order data up to that amount of time, it will also incur a high and confusing latency for users,
e.g. suppression operators with the default won't emit results up for 24 hours, while lso in practice out-of-order data usually has a much smaller time-skewness.
Instead of abstracting this config from users with a long default value, we introduced new constructs such as <code>TimeWindows#ofSizeWithNoGrace</code> to let callers always set it upon constructing the windows;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TimeDiffernce is only in JoinWindows.

e.g. suppression operators with the default won't emit results up for 24 hours, while lso in practice out-of-order data usually has a much smaller time-skewness.
Instead of abstracting this config from users with a long default value, we introduced new constructs such as <code>TimeWindows#ofSizeWithNoGrace</code> to let callers always set it upon constructing the windows;
the other setters such as <code>TimeWindows#grace</code> are deprecated and will be removed in the future.
Also when the new construct API are used for left/outer stream-stream joins, Kafka Streams would fix emitting spurious join results which may have an impact on the throughput.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack!

Copy link
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for updating the PR.

@guozhangwang guozhangwang merged commit 3e32647 into apache:trunk Jul 15, 2021
@guozhangwang guozhangwang deleted the K10847-remove-internal-config branch July 15, 2021 17:58
guozhangwang added a commit that referenced this pull request Jul 15, 2021
Also update the upgrade guide indicating about the grace period KIP and its indication on the fix with throughput impact.

Reviewers: Luke Chen <[email protected]>, Matthias J. Sax <[email protected]>
@guozhangwang
Copy link
Contributor Author

Merged to trunk; cherry-picked to 3.0.

xdgrulez pushed a commit to xdgrulez/kafka that referenced this pull request Dec 22, 2021
Also update the upgrade guide indicating about the grace period KIP and its indication on the fix with throughput impact.

Reviewers: Luke Chen <[email protected]>, Matthias J. Sax <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants