Skip to content

Commit

Permalink
Overall doc improvements & reduction docs
Browse files Browse the repository at this point in the history
Some overall proof reading, linking and small fixes.

Biggest chunk is adding the reduction counting and to divide by
the measurements we take.
  • Loading branch information
PragTob committed Mar 7, 2022
1 parent 1ae33b5 commit 6d55522
Show file tree
Hide file tree
Showing 3 changed files with 165 additions and 5 deletions.
76 changes: 71 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,14 +98,18 @@ The aforementioned [plugins](#plugins) like [benchee_html](https://github.com/be

* first runs the functions for a given warmup time without recording the results, to simulate a _"warm"/running_ system
* [measures memory usage](#measuring-memory-consumption)
* provides you with lots of statistics - check the next list
* provides you with lots of [statistics](#statitics)
* plugin/extensible friendly architecture so you can use different formatters to display benchmarking results as [HTML, markdown, JSON and more](#plugins)
* measure not only [time](#measuring-time), but also measure [memory](#measuring-memory-consumption) and [reductions](#measuring-reductions)
* as precise as it can get, measure with up to nanosecond precision (Operating System dependent)
* nicely formatted console output with units scaled to appropriately (nanoseconds to minutes)
* (optionally) measures the overhead of function calls so that the measured/reported times really are the execution time of _your_code_ without that overhead.
* [hooks](#hooks-setup-teardown-etc) to execute something before/after a benchmarking invocation
* [hooks](#hooks-setup-teardown-etc) to execute something before/after a benchmarking invocation, without it impacting the measured time
* execute benchmark jobs in parallel to gather more results in the same time, or simulate a system under load
* well documented & well tested

### Statitics

Provides you with the following **statistical data**:

* **average** - average execution time/memory usage (the lower the better)
Expand All @@ -119,7 +123,7 @@ In addition, you can optionally output an extended set of statistics:
* **minimum** - the smallest value measured for the job (fastest/least consumption)
* **maximum** - the biggest run time measured for the job (slowest/most consumption)
* **sample size** - the number of measurements taken
* **mode** - the measured values that occur the most. Often one value, but can be multiple values if they occur the same amount of times. If no value occurs at least twice, this value will be nil.
* **mode** - the measured values that occur the most. Often one value, but can be multiple values if they occur exactly as often. If no value occurs at least twice, this value will be `nil`.

## Installation

Expand Down Expand Up @@ -205,6 +209,7 @@ The available options are the following (also documented in [hexdocs](https://he
* `warmup` - the time in seconds for which a benchmarking job should be run without measuring anything before "real" measurements start. This simulates a _"warm"/running_ system. Defaults to 2.
* `time` - the time in seconds for how long each individual scenario (benchmarking job x input) should be run for measuring the execution times (run time performance). Defaults to 5.
* `memory_time` - the time in seconds for how long [memory measurements](#measuring-memory-consumption) should be conducted. Defaults to 0 (turned off).
* `reduction_time` - the time in seconds for how long [reductions are measured](#measuring-memory-reductions) should be conducted. Defaults to 0 (turned off).
* `inputs` - a map or list of two element tuples. If a map, the keys are descriptive input names and values are the actual input values. If a list of tuples, the first element in each tuple is the input name, and the second element in each tuple is the actual input value. Your benchmarking jobs will then be run with each of these inputs. For this to work your benchmarking function gets the current input passed in as an argument into the function. Defaults to `nil`, aka no input specified and functions are called without an argument. See [Inputs](#inputs).
* `formatters` - list of formatters either as a module implementing the formatter behaviour, a tuple of said module and options it should take or formatter functions. They are run when using `Benchee.run/2` or you can invoke them through `Benchee.Formatter.output/1`. Functions need to accept one argument (which is the benchmarking suite with all data) and then use that to produce output. Used for plugins & configuration. Also allows the configuration of the console formatter to print extended statistics. Defaults to the builtin console formatter `Benchee.Formatters.Console`. See [Formatters](#formatters).
* `measure_function_call_overhead` - Measure how long an empty function call takes and deduct this from each measured run time. This overhead should be negligible for all but the most micro benchmarks. Defaults to false.
Expand Down Expand Up @@ -238,7 +243,39 @@ The available options are the following (also documented in [hexdocs](https://he
[`:eprof`](https://hexdocs.pm/mix/Mix.Tasks.Profile.Eprof.html) and
[`:fprof`](https://hexdocs.pm/mix/Mix.Tasks.Profile.Fprof.html).

### Measuring memory consumption
### Metrics to measure

Benchee can't only measure [execution time](#measuring-time), but also [memory consumption](#measuring-memory-consumption) and [reductions](#measuring-reductions)!

You can measure one of these metrics, or all at the same time. The choice is up to you. Warmup will only occur once though, the time for measuring the metrics are governed by `time`, `memory_time` and `reduction_time` configuration values respectively.

By default only execution time is measured, memory and reductions need to be opted in by specifying a non 0 time amount.

#### Measuring time

This is the default, which you'll most likely want to use as you want to measure how fast your system processes something or responds to a request. Benchee does its best to measure time as accurately and as scoped to your function as possible.

```elixir
Benchee.run(
%{
"something_great" => fn -> cool_stuff end
},
warmup: 1,
time: 5,
memory_time: 2,
reduction_time: 2
)
```

##### A note on time measurement accuracy

From system to system the resolution of the clock [can vary](https://www.erlang.org/doc/apps/erts/time_correction.html).

Generally speaking we have seen accuracies down to 1 nanosecond on Linux and ~1 microsecond onb both OSX and Windows. We have also seen accuracy as low as 10ms on Windows in a CI environment.
These numbers are not a limitation of Benchee, but of the Operating System (or at the very least how erlang makes it available).

If your benchmark takes 100s of microseconds this likely has no/little impact, but **if you want to do extremely nano benchmarks we recommend doing them on Linux**.
#### Measuring Memory Consumption

Starting with version 0.13, users can now get measurements of how much memory their benchmarked scenarios use. The measurement is **limited to the process that Benchee executes your provided code in** - i.e. other processes (like worker pools)/the whole BEAM isn't taken into account.

Expand All @@ -248,7 +285,7 @@ This measurement of memory does not affect the measurement of run times.

In cases where all measurements of memory consumption are identical, which happens very frequently, the full statistics will be omitted from the standard console formatter. If your function is deterministic, this should always be the case. Only in functions with some amount of randomness will there be variation in memory usage.

Memory measurement is disabled by default, and you can choose to enable it by passing `memory_time: your_seconds` option to `Benchee.run/2`:
Memory measurement is disabled by default, you can choose to enable it by passing `memory_time: your_seconds` option to `Benchee.run/2`:

```elixir
Benchee.run(
Expand All @@ -264,6 +301,35 @@ Memory time can be specified separately as it will often be constant - so it mig
A full example, including an example of the console output, can be found
[here](samples/measure_memory.exs).

#### Measuring Reductions

Starting in versions 1.1.0 Benchee can measure reductions - but what are reductions?

In short, it's not very well defined but a "unit of work". The BEAM uses them to keep track of how long a process has run. As [the Beam Book puts it as follows](https://blog.stenmans.org/theBeamBook/#_scheduling_non_preemptive_reduction_counting):

>BEAM solves this by keeping track of how long a process has been running. This is done by counting reductions. The term originally comes from the mathematical term beta-reduction used in lambda calculus.
>
>The definition of a reduction in BEAM is not very specific, but we can see it as a small piece of work, which shouldn’t take too long. Each function call is counted as a reduction. BEAM does a test upon entry to each function to check whether the process has used up all its reductions or not. If there are reductions left the function is executed otherwise the process is suspended.
Now, why would you want to measure this? Well, apart from BIFs & NIFs, which are not accurately tracked through this, it gives an impression of how much work the BEAM is doing. And the great thing is, this is independent of the load the system is under as well as the hardware. Hence, it gives you a way to check performance that is less volatile so suitable for CI for instance.

It can slightly differ between elixir & erlang versions, though.

**Like memory measurements, this only tracks reductions directly in the function given to benchee, not processes spawned by it or other processes it uses.**

Reduction measurement is disabled by default, you can choose to enable it by passing `reduction_time: your_seconds` option to `Benchee.run/2`:

```elixir
Benchee.run(
%{
"something_great" => fn -> cool_stuff end
},
memory_time: 2
)
```

Also like memory measurements, reduction measurements will often be constant unless something changes about the execution of the benchmarking function.

### Inputs

`:inputs` is a very useful configuration that allows you to run the same benchmarking jobs with different inputs. We call this combination a _"scenario"_. You specify the inputs as either a map from name (String or atom) to the actual input value or a list of tuples where the first element in each tuple is the name and the second element in the tuple is the value.
Expand Down
37 changes: 37 additions & 0 deletions samples/reduction_run.exs
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
list = Enum.to_list(1..10_000)
map_fun = fn i -> [i, i * i] end

Benchee.run(
%{
"flat_map" => fn -> Enum.flat_map(list, map_fun) end,
"map.flatten" => fn -> list |> Enum.map(map_fun) |> List.flatten() end
},
time: 0,
reduction_time: 2
)

# tobi@qiqi:~/github/benchee(docs++)$ mix run samples/reduction_run.exs
# Operating System: Linux
# CPU Information: AMD Ryzen 9 5900X 12-Core Processor
# Number of Available Cores: 24
# Available memory: 31.27 GB
# Elixir 1.13.3
# Erlang 24.2.1

# Benchmark suite executing with the following configuration:
# warmup: 2 s
# time: 0 ns
# memory time: 0 ns
# reduction time: 2 s
# parallel: 1
# inputs: none specified
# Estimated total run time: 8 s

# Benchmarking flat_map ...
# Benchmarking map.flatten ...

# Reduction count statistics:

# Name Reduction count
# flat_map 65.01 K
# map.flatten 124.52 K - 1.92x reduction count +59.51 K
57 changes: 57 additions & 0 deletions samples/run_all.exs
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
list = Enum.to_list(1..10_000)
map_fun = fn i -> [i, i * i] end

Benchee.run(
%{
"flat_map" => fn -> Enum.flat_map(list, map_fun) end,
"map.flatten" => fn -> list |> Enum.map(map_fun) |> List.flatten() end
},
warmup: 1,
time: 5,
memory_time: 2,
reduction_time: 2
)

# tobi@qiqi:~/github/benchee(docs++)$ mix run samples/run_all.exs
# Operating System: Linux
# CPU Information: AMD Ryzen 9 5900X 12-Core Processor
# Number of Available Cores: 24
# Available memory: 31.27 GB
# Elixir 1.13.3
# Erlang 24.2.1

# Benchmark suite executing with the following configuration:
# warmup: 1 s
# time: 5 s
# memory time: 2 s
# reduction time: 2 s
# parallel: 1
# inputs: none specified
# Estimated total run time: 20 s

# Benchmarking flat_map ...
# Benchmarking map.flatten ...

# Name ips average deviation median 99th %
# flat_map 3.61 K 276.99 μs ±10.39% 273.61 μs 490.68 μs
# map.flatten 2.25 K 444.22 μs ±21.30% 410.09 μs 703.06 μs

# Comparison:
# flat_map 3.61 K
# map.flatten 2.25 K - 1.60x slower +167.22 μs

# Memory usage statistics:

# Name Memory usage
# flat_map 625 KB
# map.flatten 781.25 KB - 1.25x memory usage +156.25 KB

# **All measurements for memory usage were the same**

# Reduction count statistics:

# Name Reduction count
# flat_map 65.01 K
# map.flatten 124.52 K - 1.92x reduction count +59.51 K

# **All measurements for reduction count were the same**

0 comments on commit 6d55522

Please sign in to comment.