Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR: Make ByteUtilsBenchmark deterministic #13307

Closed

Conversation

divijvaidya
Copy link
Contributor

Motivation

The current implementation of the benchmark may produce different set of integers across benchmarks and hence, may not provide apples to apples comparison.

Changes

  1. With this change, we initialize random number generator with a fixed seed at the start of a benchmark. This ensures that for each benchmark, random number generator produces the same sequence of random values. Hence, the input data across benchmarks would be consistent providing a reliable apples to apples comparison.
  2. We ensure that a new set of random numbers are generated per iteration, so that the bechmark calculation is performed over a diverse range of values

Sample result of a benchmark run:

Benchmark                                           Mode  Cnt     Score   Error  Units
ByteUtilsBenchmark.testSizeOfUnsignedVarint        thrpt   30  1302.193 ± 0.396  ops/s
ByteUtilsBenchmark.testSizeOfUnsignedVarintSimple  thrpt   30   328.678 ± 0.269  ops/s
ByteUtilsBenchmark.testSizeOfVarlong               thrpt   30   880.113 ± 0.676  ops/s
ByteUtilsBenchmark.testSizeOfVarlongSimple         thrpt   30   109.592 ± 0.071  ops/s
JMH benchmarks done

Copy link
Contributor

@showuon showuon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement! The first point makes sense to me. But for the second point:

We ensure that a new set of random numbers are generated per iteration, so that the bechmark calculation is performed over a diverse range of values

I didn't understand why we need this change. Could you elaborate more on it? From what I can see, it should make no difference if we use the same values or not for each iteration.

Thank you.

@divijvaidya
Copy link
Contributor Author

I didn't understand why we need this change. Could you elaborate more on it? From what I can see, it should make no difference if we use the same values or not for each iteration.

Having different values for each iteration gives us ability to benchmark over diverse sample set. This increases the probability that the algorithm we are testing is optimal over a larger set of values.

@divijvaidya
Copy link
Contributor Author

@showuon I am discarding this PR as I have included it's changes in #13312

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants