Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support comparing two sets of pystats #98816

Merged
merged 3 commits into from
Nov 4, 2022

Conversation

mdboom
Copy link
Contributor

@mdboom mdboom commented Oct 28, 2022

This adds support for comparing pystats collected from two different builds.

  • The --json-output can be used to load in a set of raw stats and output a JSON file.
  • Two of these JSON files can be provided on the next run, and then comparative results between the two are output.

The refactor required is basically to:

  • Separate out the building of table contents from emitting the table
  • Call these new functions from functions designed for either single results or comparative results

Part of the work for: faster-cpython/tools#115

See mdboom#3 for a prototype of where this is possibly headed in a Github Action.

This adds support for comparing pystats collected from two different builds.

- The `--json-output` can be used to load in a set of raw stats and output a
  JSON file.
- Two of these JSON files can be provided on the next run, and then comparative
  results between the two are output.

The refactor required is basically to:

- Separate out the building of table contents from emitting the table
- Call these new functions from functions designed for either single results or
  comparative results

Part of the work for: faster-cpython/tools#115
Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that you've checked that this produces the same output.

return []

if len(a_rows):
a_ncols = list(set(len(x) for x in a_rows))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the list(set(...)), wouldn't set(...) be sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further down, I want to get the single value out of the set. (a_ncols[0])

ncols = b_ncols[0]

default = [""] * (ncols - 1)
a_data = dict((x[0], x[1:]) for x in a_rows)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a_data = { x[0]: x[1:] for x in x in a_rows}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.


default = [""] * (ncols - 1)
a_data = dict((x[0], x[1:]) for x in a_rows)
b_data = dict((x[0], x[1:]) for x in b_rows)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding a check for duplicate keys? len(a_data) == len(a_rows)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

@@ -377,8 +552,7 @@ def emit_pair_counts(opcode_stats, total):
succ_rows
)

def main():
stats = gather_stats()
def output_single_stats(stats):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using the "emit_" and "output_" prefixes interchangeably, or is there a difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's keeping the naming from the original code (which is @markshannon's), which has output_stats as a top-level function (which I split into three). I guess the difference is that output_ is these top-level functions, whereas each of the emit_ functions emits a single section. But we certainly could use emit_ everywhere.

@mdboom
Copy link
Contributor Author

mdboom commented Nov 3, 2022

I assume that you've checked that this produces the same output.

Yep. And you can see the comparative output, and the single output on my prototype PR.

@markshannon markshannon merged commit 2844aa6 into python:main Nov 4, 2022
@mdboom mdboom deleted the comparative-pystats branch December 22, 2022 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants