-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shuffle sniffer nodes to Improving stability and performance #47354
base: main
Are you sure you want to change the base?
Conversation
Hi @saiergong, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in your Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile? |
is it ok now? |
@nik9000 could you help to review this request? |
Pinging @elastic/es-core-features (:Core/Features/Java Low Level REST Client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff, lets get the imports fixed and then ill run the tests
import java.util.Map; | ||
import java.util.Objects; | ||
import java.util.Set; | ||
import java.util.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These will not pass tests, we keep our imports explicit. You may have to change your IDE to not do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fixed.
@elasticmachine update branch |
@elasticmachine ok to test |
I got to thinking about this and I am not sure I like changing the default here. Let me have a day to think about how we can fix this so that you (and others) can still have something that is shuffled, and we dont change the default. |
I wonder why does build fail? Is it not allowed to call shuffle function at this place? Or do we implement a shuffle function here? |
I think it would make more sense now to alter the Then you can add it to the |
in regard to the test failing, we are using a method we cannot use, lets instead use the API that accepts the List and a Random, and add a |
ok,More flexible in this way. I'll do it |
if we use nodes.size() as the Random seed,you should not be able to solve the problem of shuffle。 Because if the seeds are the same on all spark node, the generated random number list will be the same. maybe we should use the timestamp as the Random seed? |
This is a good point. The seed needs to be something that is reproducible from a test scenario, as well as random enough that you dont hit the problem you mention. I think it might make more sense if we relaxed the
and you can define whatever random seed (if any) you want for your nodes. Then you just need to use it like this
Then we end up with the freedom that you can do whatever you want w the sniffer, and it does not change implementation details for other clients. What do you think of this proposal instead? |
i could implement ShufflingNodesSniffer in my code, but maybe other people will encounter this problem some day. so i think maybe the community should provide the shuffle ability? if we can use the API that accepts the List and a Random, may be we could use the ip address as the Random seed? |
We also want the random seed reproducible between machines, because a test should be reproducible given a seed for the testing framework. So using the IP as a random value will not make for a reproducible test if you take it to a different machine. This is the first ask for a shuffled list, so i think the best thing we can do is let the user shuffle it if need be. We can add it in proper if we find that more people want the node shuffle feature. |
If the test framework requires the results of two machines to be replicated, then this conflicts with the function we want to achieve... |
This is not true exactly. The test should be reproducible, given a seed, which will control the source of randomness. The seed should be something that would be the same on 2 test machines, but can be influenced by something at runtime. For example, if there was a session ID that could be used, the tests could always supply the same session ID and therefore be replicated on diff machines. |
The server and everything that lives inside it can share Randomness with the tests but we don't have access to that in the clients. We use it to make tests properly reproducible all the time. But clients don't have it. |
At present, I copyed |
This is true, but only for this version. I think a PR to relax the |
Pinging @elastic/clients-team (Team:Clients) |
In the spark streaming, use the rest client to write es. When the number of nodes in the spark is large, the job restart will cause the es node to oom. Because when the spark job restarts, the order of the es nodes detected by all the spark nodes using ElasticsearchNodesSniffer is basically the same, and the initial value of the member variable lastNodeIndex of the RestClient is 0, so the initial request of each spark node is sent to the same es node. , causing the es node oom.
At the same time, if the number of requests sent to es on each spark node is basically the same, it is very likely that the requests on each spark node are basically sent to the es node in the same order, resulting high load on some nodes in es, affecting write performance.
This commits shuffle the nodes after probe the es cluster nodes, avoding such problem.