-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-15239: Fix system tests using producer performance service #14092
Conversation
Signed-off-by: Federico Valeri <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fvaleri This moves the compatibility boundary between tools and the other jars, to between clients and the rest of the JARs. It looks like tools
depends on server-common
depends on clients
, so if we loaded a 1.0.0 server-common
class that depended on a class which was missing from DEV clients
, I would expect that to fail.
If possible, I think we should eliminate this long-range version mixing and compatibility boundaries, rather than move them around, because they're already seem to be in the least-impactful place and are still causing problems.
These changes are just for the ProducerPerformanceService, and we only need clients on dev release because ThroughputThrottler moved there. That said, I don't like this approach and I see the risk you mention, so I'm opened to proposals.
Can you explain better what you mean here? |
By "version mixing" I mean that artifacts from two different versions from different versions are present on the classpath at the same time. The "long-range" just means that the distance between the versions is very large, in this case between all previous versions (including 0.8.x) and trunk. When we mix artifacts of different versions, there's a "compatibility boundary" that separates them. Imports across this compatibility boundary are not guaranteed, because the code was not compiled together. There are places where we expect a compatibility boundary, and we commit to those in the public documentation. But when we put a compatibility boundary in an atypical place (between the tools jar and clients, or clients and the rest of AK) then otherwise reasonable changes to internal classes can break the build. I said that I thought the boundary was already in the least impactful place, because there is so little code in the tools jar, and so very little opportunity for imports across the compatibility boundary to break the system tests. The clients artifact is much larger and has much more active development, and I think it is at more risk of breaking these tests. So then, I think the best solutions to this problem are going to eliminate the cross-version artifact mixing, or if that isn't possible, reduce the range of versions that are mixed to allow the dev branch to move forward. In #13313 I explored eliminating the artifact mixing, but it turned out that the 0.8.x tools was missing functionality, and so I had to inject the 0.9.x tools jar. I think we can do something similar here, try to remove the artifact mixing, and if that fails, find an intermediate version that can be injected instead of trunk. @fvaleri Let me know if that is more clear, I'm eager to help get the fix merged, so thank you for working on it! |
Hi @gharris1727, thanks for the clarification and sorry for the delay. I've been busy with other stuff.
I see, but that's not the case here, we either LATEST_0_9 or DEV_BRANCH. We do not load different module versions at the same time on the classpath, so there is no compatibility boundary to consider. Am I missing something? |
These commands are being run in the context of a kafka installation with a particular version. The version used is the So for example, if the
When the same jar is present multiple times, the classpath is ordered such that the earlier injected After all, the ProducerPerformance class depends on the Producer from clients, and this code didn't inject the clients before. And if the original condition didn't fire, then the |
This shows my lack of knowledge on Kafka STs. I see it now. Thanks.
Yeah, I'll try to do that and give an update. |
Signed-off-by: Federico Valeri <[email protected]>
@gharris1727 I tried various things, as I'm also learning how this works. The best solution I found is to move the ThroughputThrottler class from clients to server-common, adding this extra implementation dependency to connect:runtime (last commit on this branch). Why server-common? AFAIU, server-common hosts all classes shared by different modules, tools and connect:runtime in this case. Unlike tools, I think it makes sense to have this dependency for connect:runtime, as it may be used for other classes in the future. That way, we won't move the compatibility boundary. System tests will continue to use ProducerPerformance from dev branch as before, without bringing in dev clients, but using the version under test. Let me know if you see any issue with this approach. Thanks. |
@fvaleri That does appear to work, because server-common is not being excluded from tools-dependant-libs like the clients is. I don't think this was intentional, I think it was accidentally left out when server-common was added as a dependency in #12469 . There is a compatibility boundary between server-common and clients in this case, but that was already there before your change. If we choose not to change this compatibility boundary from between server-common + clients to between tools + server-common, then I think the fix you propose is possible. I agree with your reasoning that server-common is a reasonable dependency for the connect-runtime, even if it's only present use is the ThroughputThrottler. I don't really mind whether it's in clients or server-common, as long as it's visible in the dependent modules. If someone later moves the compatibility boundary, and reduces the range of the version mixing (by changing the if condition to Can you revert the changes in producer_performance.py? they don't seem to have an effect anymore. |
Signed-off-by: Federico Valeri <[email protected]>
Exactly.
I think we can improve further as you say, but probably it's better to do it in a separate PR, starting from working code.
Done. Thanks again for the help, appreciated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one comment.
I'll start running system tests for this change.
Signed-off-by: Federico Valeri <[email protected]>
System test failures appear to be due to reasons other than classloading/dependency issues. The producer performance tests appear to work normally.
|
Thank you @fvaleri for diagnosing this test regression and promptly fixing it! |
…che#14092) Reviewers: Greg Harris <[email protected]>
…che#14092) Reviewers: Greg Harris <[email protected]>
…che#14092) Reviewers: Greg Harris <[email protected]>
QuotaTest uses ProducerPerformanceService which is where the issue is located. This is the test instance that is failing.
If we look at ProducerPerformanceService stderr logs in ./results, we have the reported exception.
There are other system tests that use ProducerPerformanceService, and are failing for the same reason.