Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Benchee.CollectionData struct #264

Merged
merged 3 commits into from
Feb 11, 2019
Merged

Adding Benchee.CollectionData struct #264

merged 3 commits into from
Feb 11, 2019

Conversation

devonestes
Copy link
Collaborator

@devonestes devonestes commented Jan 29, 2019

There are two parts to this commit - adding a new Benchee.CollectionData struct, and renaming the previous measurers to collectors. Now when we have a new thing that we're benchmarking, we keep the raw samples and the calculated statistics together for that thing.

Resolves #259

Copy link
Member

@PragTob PragTob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋

Heryho, thanks for tackling this! 🎉

Getting it in now would delay a release that we've delayed for long already (hell I looked it up, I implemented nano second measurements in May last year and we still haven't released it), on the other hand this touches about everything so if we can't merge it soon it'd be a bad experience for us.

Once again, I'll try if I can get 0.14 released this weekend and then we can see whether to do it or not do it.

As for the PR, I left lots of comments but they're mainly the same and mostly minor:

  • Should the main keys in Scenario be called #{type}_data, #{type}s or just #{type}
  • I think whenever we setup test data, we should use the correct struct aka CollectionData and not just a map - I know we don't reliably do this with other stuff everywhere but I think it'd be the right call? What do you think? (Also this sounds bothersome, but I think a search and replace should fix this rather easily)
  • I think when matching on it we don't always have to specify the Struct, although it can be nice in places to provide more conext/name the concepts 🤷‍♀️

whatsapp image 2019-01-21 at 15 24 38 1

lib/benchee/benchmark/scenario.ex Show resolved Hide resolved
lib/benchee/collection_data.ex Outdated Show resolved Hide resolved
lib/benchee/benchmark/runner.ex Show resolved Hide resolved
lib/benchee/conversion.ex Show resolved Hide resolved
lib/benchee/conversion.ex Show resolved Hide resolved
test/benchee/formatters/console/memory_test.exs Outdated Show resolved Hide resolved
maximum: 333.3,
mode: 201.2,
sample_size: 50_000
memory_usage_data: %{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think everywhere in this file it should be DataCollection here in teh test setup :)

test/benchee/formatters/console/memory_test.exs Outdated Show resolved Hide resolved
test/benchee/formatters/console/memory_test.exs Outdated Show resolved Hide resolved
test/benchee/formatters/console/run_time_test.exs Outdated Show resolved Hide resolved
PragTob added a commit that referenced this pull request Feb 9, 2019
Formatters often have to check whether run time or memory data
was processed - so if data was collected at all and if statistics
have already been generated so that they can be rendered.

This should simplify this/not tie all of them to the internal
data structure.

The decision as made to check directly on statistics to see if
values actually made it into it, rather than just checking the
configuration if it was intended to measure, so that we're sure
that values are there.

This might sound like a check on `Suite` to check if all scenarios
have all data might be a good thing, however mostly formatters
split things by input so it will more often be checked on a
sub set of scenarios or individual scenarios.

This PR doesn't use the function in our formatters yet, to avoid
merge conflicts with #264 :) #264 will have to adjust its
implementation though, but that should be very easy and straight
forward.
PragTob added a commit that referenced this pull request Feb 10, 2019
* Add Scenario.data_processed? to simplify checks in formatters

Formatters often have to check whether run time or memory data
was processed - so if data was collected at all and if statistics
have already been generated so that they can be rendered.

This should simplify this/not tie all of them to the internal
data structure.

The decision as made to check directly on statistics to see if
values actually made it into it, rather than just checking the
configuration if it was intended to measure, so that we're sure
that values are there.

This might sound like a check on `Suite` to check if all scenarios
have all data might be a good thing, however mostly formatters
split things by input so it will more often be checked on a
sub set of scenarios or individual scenarios.

This PR doesn't use the function in our formatters yet, to avoid
merge conflicts with #264 :) #264 will have to adjust its
implementation though, but that should be very easy and straight
forward.
@devonestes
Copy link
Collaborator Author

I keep going back and forth about where to put ‘CollectionData’ actually. My thinking is this:

In pattern matches, it’s best to pattern match on the “simplest” possible pattern to do what we need. Basically, if I’m seeing a pattern match on one struct type in one place, then I would expect to see a similar match on a different type right next to it.

Same with test data. I sort of like the idea of using the simplest test data that will work. If a bare map will work, then there’s no reason to over specify in the tests the data that our function accepts. It makes it (theoretically) easier to both change the implementation later on if the tests are as general as possible, and to refactor the existing implementation if the test isn’t as coupled to the current implementation.

The trade off there I guess is in explicitness and self-documentation.

I do think that in doctests I should probably specify the struct, since those tests serve a specific documentation purpose.

That said, those are just theoretical benefits/drawbacks and my personal taste. I don’t think it will make a big difference to the actual maintenance later on.

One thing I feel pretty strongly about is the “memory_usage_data” and “run_time_data” keys. Just “run_time” or “run_times” I think is much too general for the value for those keys. Otherwise you need to know the type of the data stored at that key to know what that “run_time” is referring to, and I think it should be the other way around - they key tells you (or at least gives you a hint about) what’s stored in that key.

So, I think I’m going to go back and add they struct type, but I also think we should keep those key names.

@PragTob
Copy link
Member

PragTob commented Feb 10, 2019

👋

Didn't expect you back here yet :D

If you feel strongly about the key names, I'm happy to keep them! Was more a hunch with me 👍

Regarding pattern matches, I agree - simplest thing that works seems good 👍

As for test data, I used to think the same - aka just maps with the needed keys in tests and we're good. However, now I think we should use the right struct more because:

  • we have type specs, then technically the tests violate the type specs 🤷‍♂️
  • I think it's still kinda the simplest thing that works because the difference between the map and the struct is specifying the struct type - we still only supply the keys relevant to the test in question so I think that property still kinda holds
  • if we'd ever pattern match on the struct in function heads or what not, the tests would break as we only supplied a map - so supplying the struct seems to make it more robust to me :)

Cheerio!

29090720_339834176521801_6828362667602739200_n

This just changes some names of files and functions from `measurer` to
`collector, `measure` to `collect`.
Now when we have a thing we're measuring, we keep all the collected
samples and the calculated statistics together in a single data
structure.
Copy link
Member

@PragTob PragTob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@PragTob PragTob merged commit bdfeb2e into master Feb 11, 2019
@PragTob PragTob deleted the Issue-259 branch February 11, 2019 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants