shuffle sniffer nodes to Improving stability and performance #47354

saiergong · 2019-10-01T11:36:25Z

In the spark streaming, use the rest client to write es. When the number of nodes in the spark is large, the job restart will cause the es node to oom. Because when the spark job restarts, the order of the es nodes detected by all the spark nodes using ElasticsearchNodesSniffer is basically the same, and the initial value of the member variable lastNodeIndex of the RestClient is 0, so the initial request of each spark node is sent to the same es node. , causing the es node oom.

At the same time, if the number of requests sent to es on each spark node is basically the same, it is very likely that the requests on each spark node are basically sent to the es node in the same order, resulting high load on some nodes in es, affecting write performance.

This commits shuffle the nodes after probe the es cluster nodes, avoding such problem.

elasticcla · 2019-10-01T11:45:39Z

Hi @saiergong, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in your Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile?

saiergong · 2019-10-01T11:52:50Z

Hi @saiergong, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in your Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile?

is it ok now?

saiergong · 2019-10-10T03:10:51Z

@nik9000 could you help to review this request?

elasticmachine · 2019-10-15T19:25:18Z

Pinging @elastic/es-core-features (:Core/Features/Java Low Level REST Client)

hub-cap

Good stuff, lets get the imports fixed and then ill run the tests

hub-cap · 2019-10-17T14:20:17Z

client/sniffer/src/main/java/org/elasticsearch/client/sniff/ElasticsearchNodesSniffer.java

-import java.util.Map;
-import java.util.Objects;
-import java.util.Set;
+import java.util.*;


These will not pass tests, we keep our imports explicit. You may have to change your IDE to not do this.

hub-cap · 2019-10-22T14:32:58Z

@elasticmachine update branch

hub-cap · 2019-10-22T14:33:10Z

@elasticmachine ok to test

hub-cap · 2019-10-23T16:41:14Z

I got to thinking about this and I am not sure I like changing the default here. Let me have a day to think about how we can fix this so that you (and others) can still have something that is shuffled, and we dont change the default.

saiergong · 2019-10-23T18:27:35Z

I got to thinking about this and I am not sure I like changing the default here. Let me have a day to think about how we can fix this so that you (and others) can still have something that is shuffled, and we dont change the default.

I wonder why does build fail? Is it not allowed to call shuffle function at this place? Or do we implement a shuffle function here?

hub-cap · 2019-10-24T15:12:08Z

I think it would make more sense now to alter the Sniffer code so that there is a new paramerter (maybe boolean shuffleNodes)introduced into the constructor, and then in Sniffer#sniff() check for this parameter being true, and if so, do the shuffle there instead. It will allow for custom sniffer impls to also benefit from this shuffling if they want to.

Then you can add it to the SnifferBuilder too so its set properly in the constructor of Sniffer and please default it to false if its not set in the builder.

hub-cap · 2019-10-24T19:05:50Z

in regard to the test failing, we are using a method we cannot use, lets instead use the API that accepts the List and a Random, and add a new Random(nodes.size()) that is based off the number of nodes returned in the sniffed list.

saiergong · 2019-10-28T08:56:27Z

ok，More flexible in this way. I'll do it

saiergong · 2019-10-28T10:08:15Z

in regard to the test failing, we are using a method we cannot use, lets instead use the API that accepts the List and a Random, and add a new Random(nodes.size()) that is based off the number of nodes returned in the sniffed list.

if we use nodes.size() as the Random seed，you should not be able to solve the problem of shuffle。 Because if the seeds are the same on all spark node, the generated random number list will be the same. maybe we should use the timestamp as the Random seed?

hub-cap · 2019-10-28T14:42:14Z

This is a good point. The seed needs to be something that is reproducible from a test scenario, as well as random enough that you dont hit the problem you mention. I think it might make more sense if we relaxed the final modifier on the ElasticsearchNodesSniffer, you should be able to just override the sniff() method like the following

public class ShufflingNodesSniffer extends ElasticsearchNodesSniffer {
    public ShufflingNodesSniffer(RestClient restClient) {
        super(restClient);
    }

    @Override
    public List<Node> sniff() throws IOException {
        List<Node> nodes = super.sniff();
        Collections.shuffle(nodes);
        return nodes;
    }
}

and you can define whatever random seed (if any) you want for your nodes. Then you just need to use it like this

ShufflingNodesSniffer nodesSniffer = new ShufflingNodesSniffer(restClient);
Sniffer sniffer = Sniffer.builder(restClient).setNodesSniffer(nodesSniffer).build();

Then we end up with the freedom that you can do whatever you want w the sniffer, and it does not change implementation details for other clients. What do you think of this proposal instead?

saiergong · 2019-10-29T02:32:50Z

i could implement ShufflingNodesSniffer in my code, but maybe other people will encounter this problem some day. so i think maybe the community should provide the shuffle ability?

if we can use the API that accepts the List and a Random, may be we could use the ip address as the Random seed?

hub-cap · 2019-10-30T00:16:17Z

We also want the random seed reproducible between machines, because a test should be reproducible given a seed for the testing framework. So using the IP as a random value will not make for a reproducible test if you take it to a different machine.

This is the first ask for a shuffled list, so i think the best thing we can do is let the user shuffle it if need be. We can add it in proper if we find that more people want the node shuffle feature.

saiergong · 2019-10-30T02:22:59Z

If the test framework requires the results of two machines to be replicated, then this conflicts with the function we want to achieve...

hub-cap · 2019-10-30T17:19:13Z

If the test framework requires the results of two machines to be replicated, then this conflicts with the function we want to achieve...

This is not true exactly. The test should be reproducible, given a seed, which will control the source of randomness. The seed should be something that would be the same on 2 test machines, but can be influenced by something at runtime. For example, if there was a session ID that could be used, the tests could always supply the same session ID and therefore be replicated on diff machines.

nik9000 · 2019-10-30T17:27:44Z

The server and everything that lives inside it can share Randomness with the tests but we don't have access to that in the clients. We use it to make tests properly reproducible all the time. But clients don't have it.

saiergong · 2019-10-31T04:17:34Z

At present, I copyed ElasticsearchNodesSniffer to my project package and add shuffle function, which has solved my problem.

hub-cap · 2019-10-31T13:52:02Z

which has solved my problem.

This is true, but only for this version. I think a PR to relax the final on the sniff() method I mentioned would be best, and it would allow you to stop copying the class wholesale into your project. Otherwise if we introduce new code or refactor things, you may not get the newest code.

elasticsearchmachine · 2022-07-27T15:00:17Z

Pinging @elastic/clients-team (Team:Clients)

saiergong added 2 commits October 1, 2019 17:43

shuffle nodes when sniffer nodes

41afcd8

Delete unnecessary files

f464dd0

$@polyfractal$ polyfractal added the :Clients/Java Low Level REST Client Minimal dependencies Java Client for Elasticsearch label Oct 15, 2019

jakelandis requested a review from hub-cap October 17, 2019 14:13

hub-cap suggested changes Oct 17, 2019

View reviewed changes

fix import

99d180a

Merge branch 'master' into master

378fa80

hub-cap added >enhancement v7.6.0 v8.0.0 labels Oct 22, 2019

elasticsearchmachine changed the base branch from master to main July 22, 2022 23:14

mark-vieira added v8.5.0 and removed v8.4.0 labels Jul 27, 2022

elasticsearchmachine added Team:Clients Meta label for clients team and removed Team:Data Management Meta label for data/management team labels Jul 27, 2022

csoulios added v8.6.0 and removed v8.5.0 labels Sep 21, 2022

kingherc added v8.7.0 and removed v8.6.0 labels Nov 16, 2022

rjernst added v8.8.0 and removed v8.7.0 labels Feb 8, 2023

gmarouli added v8.9.0 and removed v8.8.0 labels Apr 26, 2023

pugnascotia added v8.10.0 and removed v8.9.0 labels Jun 22, 2023

quux00 added v8.11.0 and removed v8.10.0 labels Aug 16, 2023

mattc58 added v8.12.0 and removed v8.11.0 labels Oct 4, 2023

brianseeders added v8.13.0 and removed v8.12.0 labels Dec 6, 2023

elasticsearchmachine added v8.14.0 and removed v8.13.0 labels Feb 14, 2024

elasticsearchmachine added v8.15.0 and removed v8.14.0 labels Apr 17, 2024

elasticsearchmachine added v8.16.0 and removed v8.15.0 labels Jul 4, 2024

mark-vieira added v9.0.0 and removed v8.16.0 labels Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shuffle sniffer nodes to Improving stability and performance #47354

shuffle sniffer nodes to Improving stability and performance #47354

saiergong commented Oct 1, 2019 •

edited

Loading

elasticcla commented Oct 1, 2019

saiergong commented Oct 1, 2019

saiergong commented Oct 10, 2019

elasticmachine commented Oct 15, 2019

hub-cap left a comment

hub-cap Oct 17, 2019

saiergong Oct 18, 2019

hub-cap commented Oct 22, 2019

hub-cap commented Oct 22, 2019

hub-cap commented Oct 23, 2019

saiergong commented Oct 23, 2019

hub-cap commented Oct 24, 2019

hub-cap commented Oct 24, 2019

saiergong commented Oct 28, 2019

saiergong commented Oct 28, 2019

hub-cap commented Oct 28, 2019

saiergong commented Oct 29, 2019

hub-cap commented Oct 30, 2019

saiergong commented Oct 30, 2019

hub-cap commented Oct 30, 2019

nik9000 commented Oct 30, 2019

saiergong commented Oct 31, 2019

hub-cap commented Oct 31, 2019

elasticsearchmachine commented Jul 27, 2022

shuffle sniffer nodes to Improving stability and performance #47354

Are you sure you want to change the base?

shuffle sniffer nodes to Improving stability and performance #47354

Conversation

saiergong commented Oct 1, 2019 • edited Loading

elasticcla commented Oct 1, 2019

saiergong commented Oct 1, 2019

saiergong commented Oct 10, 2019

elasticmachine commented Oct 15, 2019

hub-cap left a comment

Choose a reason for hiding this comment

hub-cap Oct 17, 2019

Choose a reason for hiding this comment

saiergong Oct 18, 2019

Choose a reason for hiding this comment

hub-cap commented Oct 22, 2019

hub-cap commented Oct 22, 2019

hub-cap commented Oct 23, 2019

saiergong commented Oct 23, 2019

hub-cap commented Oct 24, 2019

hub-cap commented Oct 24, 2019

saiergong commented Oct 28, 2019

saiergong commented Oct 28, 2019

hub-cap commented Oct 28, 2019

saiergong commented Oct 29, 2019

hub-cap commented Oct 30, 2019

saiergong commented Oct 30, 2019

hub-cap commented Oct 30, 2019

nik9000 commented Oct 30, 2019

saiergong commented Oct 31, 2019

hub-cap commented Oct 31, 2019

elasticsearchmachine commented Jul 27, 2022

saiergong commented Oct 1, 2019 •

edited

Loading