Support comparing two sets of pystats #98816

mdboom · 2022-10-28T15:45:50Z

This adds support for comparing pystats collected from two different builds.

The --json-output can be used to load in a set of raw stats and output a JSON file.
Two of these JSON files can be provided on the next run, and then comparative results between the two are output.

The refactor required is basically to:

Separate out the building of table contents from emitting the table
Call these new functions from functions designed for either single results or comparative results

Part of the work for: faster-cpython/tools#115

See mdboom#3 for a prototype of where this is possibly headed in a Github Action.

This adds support for comparing pystats collected from two different builds. - The `--json-output` can be used to load in a set of raw stats and output a JSON file. - Two of these JSON files can be provided on the next run, and then comparative results between the two are output. The refactor required is basically to: - Separate out the building of table contents from emitting the table - Call these new functions from functions designed for either single results or comparative results Part of the work for: faster-cpython/tools#115

markshannon

I assume that you've checked that this produces the same output.

markshannon · 2022-11-03T14:57:41Z

Tools/scripts/summarize_stats.py

+        return []
+
+    if len(a_rows):
+        a_ncols = list(set(len(x) for x in a_rows))


Why the list(set(...)), wouldn't set(...) be sufficient?

Further down, I want to get the single value out of the set. (a_ncols[0])

markshannon · 2022-11-03T15:03:07Z

Tools/scripts/summarize_stats.py

+        ncols = b_ncols[0]
+
+    default = [""] * (ncols - 1)
+    a_data = dict((x[0], x[1:]) for x in a_rows)


a_data = { x[0]: x[1:] for x in x in a_rows}

Makes sense.

markshannon · 2022-11-03T15:04:26Z

Tools/scripts/summarize_stats.py

+
+    default = [""] * (ncols - 1)
+    a_data = dict((x[0], x[1:]) for x in a_rows)
+    b_data = dict((x[0], x[1:]) for x in b_rows)


Is it worth adding a check for duplicate keys? len(a_data) == len(a_rows)

iritkatriel · 2022-11-03T15:41:26Z

Tools/scripts/summarize_stats.py

@@ -377,8 +552,7 @@ def emit_pair_counts(opcode_stats, total):
                    succ_rows
                )

-def main():
-    stats = gather_stats()
+def output_single_stats(stats):


Are you using the "emit_" and "output_" prefixes interchangeably, or is there a difference?

It's keeping the naming from the original code (which is @markshannon's), which has output_stats as a top-level function (which I split into three). I guess the difference is that output_ is these top-level functions, whereas each of the emit_ functions emits a single section. But we certainly could use emit_ everywhere.

mdboom · 2022-11-03T16:39:45Z

I assume that you've checked that this produces the same output.

Yep. And you can see the comparative output, and the single output on my prototype PR.

bedevere-bot added the awaiting review label Oct 28, 2022

mdboom requested a review from markshannon October 28, 2022 15:45

mdboom added skip news skip issue labels Oct 28, 2022

mdboom mentioned this pull request Oct 28, 2022

Tooling for automated stats gathering faster-cpython/tools#115

Closed

13 tasks

mdboom requested a review from brandtbucher October 31, 2022 20:09

mdboom added 2 commits November 2, 2022 18:24

Use dictionary comprehensions

0907b0f

Check for duplicate keys

d08fd3e

markshannon reviewed Nov 3, 2022

View reviewed changes

mdboom requested a review from markshannon November 3, 2022 15:27

iritkatriel reviewed Nov 3, 2022

View reviewed changes

markshannon merged commit 2844aa6 into python:main Nov 4, 2022

bedevere-bot removed the awaiting review label Nov 4, 2022

mdboom deleted the comparative-pystats branch December 22, 2022 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support comparing two sets of pystats #98816

Support comparing two sets of pystats #98816

mdboom commented Oct 28, 2022

markshannon left a comment

markshannon Nov 3, 2022

mdboom Nov 3, 2022

markshannon Nov 3, 2022

mdboom Nov 3, 2022

markshannon Nov 3, 2022

mdboom Nov 3, 2022

iritkatriel Nov 3, 2022

mdboom Nov 3, 2022

mdboom commented Nov 3, 2022 •

edited

Loading

Support comparing two sets of pystats #98816

Support comparing two sets of pystats #98816

Conversation

mdboom commented Oct 28, 2022

markshannon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdboom commented Nov 3, 2022 • edited Loading

mdboom commented Nov 3, 2022 •

edited

Loading