Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-109329: Support for basic pystats for Tier 2 #109913

Merged
merged 17 commits into from
Oct 4, 2023

Conversation

mdboom
Copy link
Contributor

@mdboom mdboom commented Sep 26, 2023

This just implements a set of tier 2 pystats mentioned in #109329.

This implements:

  • Total micro-ops executed
  • Total number of traces started
  • Total number of traces created
  • Optimization attempts
  • Per uop execution counts, like we have for tier 1 instructions.
  • A histogram of uops executed per trace
  • Trace too long
  • Unsupported opcode (maybe even with counters for each offender)
  • Inner loop found
  • Too many frame pushes (currently capped at a depth of 5)
  • Too many frame pops (if we return from the original frame)
  • Recursive function call

It also fixes a "bug" where specialization "hits" in the running Tier 2 interpreter would count against the Tier 1 stats.

Not implemented (since it will require reworking of DEOPT_IF calls):

  • Exit reason counts: polymorphism vs branch mis-prediction

Example output (for nbody benchmark):

Optimization (Tier 2) stats

statistics about the Tier 2 optimizer

Overall stats

overall stats
Count Ratio
Optimization attempts 5
Traces created 5 100.0%
Traces executed 1,600,070
Uops executed 65,202,621 40
Trace stack overflow 0
Trace stack underflow 0
Trace too long 5
Inner loop found 0
Recursive call 0

Trace length histogram

Range Count Ratio
<= 4 0 0.0%
<= 8 0 0.0%
<= 16 0 0.0%
<= 32 0 0.0%
<= 64 5 100.0%

Optimized trace length histogram

Range Count Ratio
<= 4 0 0.0%
<= 8 0 0.0%
<= 16 0 0.0%
<= 32 0 0.0%
<= 64 5 100.0%

Trace run length histogram

Range Count Ratio
<= 4 0 0.0%
<= 8 0 0.0%
<= 16 0 0.0%
<= 32 0 0.0%
<= 64 1,600,070 100.0%

Uop stats

uop stats
Uop Count Self Cumulative
STORE_FAST 14,900,598 22.9% 22.9%
_SET_IP 14,100,583 21.6% 44.5%
LOAD_FAST 10,100,513 15.5% 60.0%
_GUARD_BOTH_FLOAT 3,700,274 5.7% 65.6%
_BINARY_OP_SUBTRACT_FLOAT 2,900,138 4.4% 70.1%
UNPACK_SEQUENCE_TUPLE 2,400,092 3.7% 73.8%
UNPACK_SEQUENCE_LIST 2,400,092 3.7% 77.5%
_POP_JUMP_IF_TRUE 1,700,049 2.6% 80.1%
_EXIT_TRACE 1,600,070 2.5% 82.5%
_ITER_CHECK_LIST 1,600,065 2.5% 85.0%
_IS_ITER_EXHAUSTED_LIST 1,600,065 2.5% 87.4%
COPY 1,599,948 2.5% 89.9%
_ITER_NEXT_LIST 1,400,053 2.1% 92.0%
UNPACK_SEQUENCE_TWO_TUPLE 1,000,039 1.5% 93.6%
LOAD_CONST 800,001 1.2% 94.8%
SWAP 799,974 1.2% 96.0%
BINARY_SUBSCR_LIST_INT 799,974 1.2% 97.2%
_BINARY_OP_MULTIPLY_FLOAT 400,095 0.6% 97.9%
_BINARY_OP_ADD_FLOAT 400,041 0.6% 98.5%
STORE_SUBSCR_LIST_INT 399,987 0.6% 99.1%
POP_TOP 200,017 0.3% 99.4%
_ITER_CHECK_RANGE 99,984 0.2% 99.5%
_IS_ITER_EXHAUSTED_RANGE 99,984 0.2% 99.7%
_ITER_NEXT_RANGE 99,979 0.2% 99.8%
GET_ITER 99,979 0.2% 100.0%
BINARY_OP 27 0.0% 100.0%

Unsupported opcodes

unsupported opcodes
Opcode Count

Copy link
Member

@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so great to have, thanks! We can tweak the collection and output further if need be, but this is already going to be so much more insightful!

Just a couple of suggestions, but otherwise I'll plan on merging tomorrow.

#define OPT_STAT_INC(name) do { if (_Py_stats) _Py_stats->optimization_stats.name++; } while (0)
#define UOP_EXE_INC(opname) do { if (_Py_stats) _Py_stats->optimization_stats.opcode[opname].execution_count++; } while (0)
#define OPT_UNSUPPORTED_OPCODE(opname) do { if (_Py_stats) _Py_stats->optimization_stats.unsupported_opcode[opname]++; } while (0)
#define OPT_HIST(length, name) do { if (_Py_stats) _Py_stats->optimization_stats.name[_Py_bit_length((length - 1) >> 2)]++; } while (0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ((length - 1) >> 2)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, there is a minimum trace size of 2, so the first two power-of-two buckets (for 0 and 1) would always be empty, so might as well just not support them (that's the >> 2). Then I also think it's easier to reason about buckets that have an inclusive maximum of a round number (that's the - 1).

uint64_t execution_count;
} UOpStats;

#define _Py_UOP_HIST_SIZE 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll probably want more buckets than this (since execution lengths for closed loops will get pretty big), but it's probably fine for now. We can just bump it whenever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the max trace length was 64 (_Py_UOP_MAX_TRACE_LENGTH). Can they be longer? (It's quite possible I don't understand the code). In any event, I should probably add a comment here that this should be log2 of whatever the expected maximum length is.

Copy link
Member

@brandtbucher brandtbucher Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can they be longer? (It's quite possible I don't understand the code).

Not statically, but dynamically they can be quite long (imagine executing a trace for a closed loop hundreds of times).

So we should probably modify the code that adds to the histogram to be Py_MAX(_Py_UOP_HIST_SIZE - 1, _Py_bit_length((length - 1) >> _Py_UOP_HIST_BIAS)), so that anything overflowing the histogram ends up in the largest bucket instead of writing past the end of the array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good idea to add the range-checking.

Also, this made me realize I was counting the wrong thing -- I was recording the "instruction pointer at exit" to be the execution length. But (after discussing in person) it seems what we want is the number of uops executed per execution of the trace. I've updated this to report that instead.

print_histogram(FILE *out, const char *name, uint64_t hist[_Py_UOP_HIST_SIZE])
{
for (int i = 0; i < _Py_UOP_HIST_SIZE; i++) {
fprintf(out, "%s[%d]: %" PRIu64 "\n", name, (1 << (i + 2)), hist[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why i + 2? Is it to correct for the >> 2 when recording the stats?

Oh, I think I see. We're starting with the "third" bucket, since tiny traces are basically nonexistent. Maybe add another constant next to _Py_UOP_HIST_SIZE (like #define _Py_UOP_HIST_BIAS 2 or something) so it's clearer why we're doing this and where.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. Adding the #define for bias makes sense.

continue
name, _, rest = key[7:].partition("]")
name, _, rest = key[len(prefix) + 2:].partition("]")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? The old value was 7, but len("opcode") + 2 is 8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. It seems to not matter, but might as well revert it just in case.

def _emit_comparative_execution_counts(base_rows, head_rows):
base_data = dict((x[0], x[1:]) for x in base_rows)
head_data = dict((x[0], x[1:]) for x in head_rows)
opcodes = set(base_data.keys()) | set(head_data.keys())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dictionary keys support set operations:

Suggested change
opcodes = set(base_data.keys()) | set(head_data.keys())
opcodes = base_data.keys() | head_data.keys()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL!

Tools/scripts/summarize_stats.py Outdated Show resolved Hide resolved

rows = []
default = [0, "0.0%", "0.0%", 0]
for opcode in opcodes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe give these a deterministic (and somewhat meaningful) order?

Suggested change
for opcode in opcodes:
for opcode in sorted(opcodes):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, be we end up sorting by the counts a few lines below anyway...

Tools/scripts/summarize_stats.py Outdated Show resolved Hide resolved
(opcode, base_entry[0], head_entry[0],
f"{100*change:0.1f}%"))

rows.sort(key=lambda x: -abs(percentage_to_float(x[-1])))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rows.sort(key=lambda x: -abs(percentage_to_float(x[-1])))
rows.sort(key=lambda x: abs(percentage_to_float(x[-1])), reverse=True)

@brandtbucher
Copy link
Member

(Also, it looks like CI caught some trailing whitespace.)

Copy link
Member

@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

@brandtbucher brandtbucher merged commit e561e98 into python:main Oct 4, 2023
21 checks passed
Glyphack pushed a commit to Glyphack/cpython that referenced this pull request Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants