GH-98686: Quicken everything #98687

brandtbucher · 2022-10-25T23:52:17Z

Issue: Quicken everything #98686

brandtbucher · 2022-10-25T23:55:29Z

1% faster:

Slower (10):
- xml_etree_iterparse: 102 ms +- 2 ms -> 106 ms +- 2 ms: 1.04x slower
- python_startup_no_site: 6.09 ms +- 0.01 ms -> 6.29 ms +- 0.01 ms: 1.03x slower
- spectral_norm: 94.8 ms +- 3.9 ms -> 96.7 ms +- 2.0 ms: 1.02x slower
- python_startup: 8.47 ms +- 0.01 ms -> 8.63 ms +- 0.01 ms: 1.02x slower
- chameleon: 6.54 ms +- 0.06 ms -> 6.63 ms +- 0.07 ms: 1.01x slower
- nqueens: 79.9 ms +- 1.1 ms -> 81.0 ms +- 0.9 ms: 1.01x slower
- pickle: 10.3 us +- 0.1 us -> 10.4 us +- 0.1 us: 1.01x slower
- json: 4.66 ms +- 0.10 ms -> 4.71 ms +- 0.10 ms: 1.01x slower
- regex_dna: 204 ms +- 1 ms -> 205 ms +- 1 ms: 1.01x slower
- regex_compile: 129 ms +- 1 ms -> 130 ms +- 0 ms: 1.00x slower

Faster (42):
- richards: 45.2 ms +- 0.7 ms -> 42.2 ms +- 0.8 ms: 1.07x faster
- regex_v8: 22.5 ms +- 0.1 ms -> 21.3 ms +- 0.1 ms: 1.06x faster
- pidigits: 198 ms +- 0 ms -> 189 ms +- 0 ms: 1.05x faster
- go: 141 ms +- 2 ms -> 135 ms +- 1 ms: 1.05x faster
- scimark_sor: 108 ms +- 2 ms -> 104 ms +- 1 ms: 1.04x faster
- mdp: 2.55 sec +- 0.01 sec -> 2.48 sec +- 0.01 sec: 1.03x faster
- pyflate: 410 ms +- 3 ms -> 398 ms +- 3 ms: 1.03x faster
- deepcopy_reduce: 2.95 us +- 0.05 us -> 2.86 us +- 0.04 us: 1.03x faster
- scimark_lu: 111 ms +- 2 ms -> 108 ms +- 2 ms: 1.03x faster
- pickle_list: 4.21 us +- 0.05 us -> 4.11 us +- 0.04 us: 1.03x faster
- deltablue: 3.40 ms +- 0.04 ms -> 3.32 ms +- 0.03 ms: 1.02x faster
- scimark_fft: 322 ms +- 2 ms -> 315 ms +- 4 ms: 1.02x faster
- genshi_text: 21.6 ms +- 0.2 ms -> 21.2 ms +- 0.2 ms: 1.02x faster
- coroutines: 25.0 ms +- 0.3 ms -> 24.5 ms +- 0.3 ms: 1.02x faster
- xml_etree_process: 54.2 ms +- 0.8 ms -> 53.2 ms +- 0.6 ms: 1.02x faster
- xml_etree_generate: 77.0 ms +- 1.2 ms -> 75.6 ms +- 0.7 ms: 1.02x faster
- deepcopy: 334 us +- 6 us -> 328 us +- 3 us: 1.02x faster
- django_template: 33.1 ms +- 0.3 ms -> 32.5 ms +- 0.3 ms: 1.02x faster
- tornado_http: 94.2 ms +- 1.4 ms -> 92.6 ms +- 1.4 ms: 1.02x faster
- pycparser: 1.10 sec +- 0.02 sec -> 1.08 sec +- 0.02 sec: 1.02x faster
- thrift: 756 us +- 14 us -> 745 us +- 9 us: 1.02x faster
- pickle_dict: 31.3 us +- 0.2 us -> 30.9 us +- 0.2 us: 1.01x faster
- fannkuch: 381 ms +- 4 ms -> 376 ms +- 6 ms: 1.01x faster
- sqlalchemy_declarative: 134 ms +- 3 ms -> 133 ms +- 3 ms: 1.01x faster
- async_tree_cpu_io_mixed: 743 ms +- 20 ms -> 734 ms +- 16 ms: 1.01x faster
- pickle_pure_python: 288 us +- 2 us -> 284 us +- 4 us: 1.01x faster
- logging_simple: 5.85 us +- 0.09 us -> 5.79 us +- 0.09 us: 1.01x faster
- logging_format: 6.49 us +- 0.07 us -> 6.42 us +- 0.08 us: 1.01x faster
- sympy_str: 283 ms +- 3 ms -> 280 ms +- 3 ms: 1.01x faster
- sqlglot_normalize: 105 ms +- 1 ms -> 104 ms +- 1 ms: 1.01x faster
- 2to3: 251 ms +- 1 ms -> 248 ms +- 1 ms: 1.01x faster
- mako: 9.76 ms +- 0.05 ms -> 9.67 ms +- 0.04 ms: 1.01x faster
- pathlib: 17.7 ms +- 0.3 ms -> 17.5 ms +- 0.2 ms: 1.01x faster
- gunicorn: 1.08 ms +- 0.01 ms -> 1.07 ms +- 0.01 ms: 1.01x faster
- sqlite_synth: 2.62 us +- 0.06 us -> 2.60 us +- 0.04 us: 1.01x faster
- sympy_expand: 457 ms +- 3 ms -> 454 ms +- 7 ms: 1.01x faster
- regex_effbot: 3.52 ms +- 0.01 ms -> 3.50 ms +- 0.01 ms: 1.01x faster
- dulwich_log: 61.2 ms +- 0.3 ms -> 60.9 ms +- 0.3 ms: 1.01x faster
- sqlglot_optimize: 51.5 ms +- 0.4 ms -> 51.2 ms +- 0.3 ms: 1.01x faster
- nbody: 95.8 ms +- 0.8 ms -> 95.4 ms +- 0.9 ms: 1.00x faster
- generators: 72.7 ms +- 0.2 ms -> 72.4 ms +- 0.3 ms: 1.00x faster
- sympy_integrate: 20.4 ms +- 0.1 ms -> 20.4 ms +- 0.1 ms: 1.00x faster

Benchmark hidden because not significant (33): aiohttp, async_tree_none, async_tree_io, async_tree_memoization, chaos, coverage, crypto_pyaes, deepcopy_memo, float, genshi_xml, hexiom, html5lib, json_dumps, json_loads, logging_silent, meteor_contest, mypy, pprint_safe_repr, pprint_pformat, pylint, raytrace, scimark_monte_carlo, scimark_sparse_mat_mult, sqlalchemy_imperative, sqlglot_parse, sqlglot_transpile, sympy_sum, telco, unpack_sequence, unpickle, unpickle_list, unpickle_pure_python, xml_etree_parse

Geometric mean: 1.01x faster

markshannon · 2022-10-26T09:06:57Z

I notice that startup (both with and without -S) is slower. Presumably that is the overhead of calling _PyCode_Quicken() all the time.
Presumably all that overhead will go when we move quickening to the bytecode compiler.

markshannon

The original design made a distinction between the initial adaptive counter (which was zero as there was already a delay for quickening), and the first value for the backoff counter (15).

This PR loses that distinction.

Maybe we don't need that distinction, we already have to hit 53 misses before falling back to the adaptive form? Maybe we should retry specialization almost immediately and rely on the exponential backoff in case of repeated failures?

How about setting the initial value to 3 (or 5 or 7) to avoid specializing too aggressively at startup.
We can reuse the same value, as it makes little difference once misses are taken into account (53 + 7 is near enough 53 + 15)

One final point (not a fault of this PR, but relevant) is that in adaptive_counter_start()
The counter is set to (2**backoff-1), there is no reason for this.
We could set the counter to 3 or 5 and the backoff to 4.

@brandtbucher thoughts?

markshannon · 2022-10-26T09:20:38Z

Include/internal/pycore_code.h

-/* The initial counter value is 31 == 2**ADAPTIVE_BACKOFF_START - 1 */
-#define ADAPTIVE_BACKOFF_START 5
+/* The initial counter value is 1 == 2**ADAPTIVE_BACKOFF_START - 1 */
+#define ADAPTIVE_BACKOFF_START 1


This is for backoff, after failure. Do you want to change this, or just lower the initial counter value?

markshannon · 2022-10-26T09:25:02Z

Python/specialize.c

        uint8_t adaptive_opcode = _PyOpcode_Adaptive[opcode];
        if (adaptive_opcode) {
            _Py_SET_OPCODE(instructions[i], adaptive_opcode);
-            // Make sure the adaptive counter is zero:
-            assert(instructions[i + 1] == 0);
+            instructions[i + 1] = adaptive_counter_start();


I don't think this is what you want. adaptive_counter_start() is supposed to be used after resetting to the adaptive form after repeated misses.

Maybe rename adaptive_counter_start to adaptive_initial_backoff_value() to avoid future confusion.?

What values did you try for the initial counter value?
1 seems low, as we might be spending too much effort specializing startup code.

bedevere-bot · 2022-10-26T09:39:31Z

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

brandtbucher · 2022-11-01T22:23:50Z

Recapping our discussion on Monday:

An initial value of 1 seems to work best (1% faster) out of all the values I tried (0, 1, 3, 7, 15, 31, 63, 127, 255, 511, 1023, 2047, and 4095). With a value of 1, we specialize the second time each adaptive instruction is run. This makes sense: running twice is probably a much better indicator of "hotness" than running once, but additional warmup delays only prevent specialization. Most types stabilize by the second loop iteration, too.

It's not strictly necessary to use the same value for the initial warmup delay and the initial backoff value, but 1 seems to work well for both. We can always tweak/rename one of them later if we want.

bedevere-bot · 2022-11-01T22:56:50Z

🤖 New build scheduled with the buildbot fleet by @brandtbucher for commit 17292a3 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

brandtbucher added 12 commits October 6, 2022 14:44

First pass at quickening everything

9308140

Fix up deepfreeze

30282eb

Catch up with main

a7fd54f

ADAPTIVE_BACKOFF_START = 1

0a204e1

Fix __lltrace__

d407a51

Fix test_dis

fc5c338

Handle double-quickening correctly

659add7

Fix test_embed

f9f9c3a

Remove some unneeded _PyOpcode_Deopts

5bf4b12

fixup

9b59294

blurb add

6e3869c

Catch up with main

17292a3

brandtbucher added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Oct 25, 2022

brandtbucher self-assigned this Oct 25, 2022

brandtbucher requested a review from markshannon as a code owner October 25, 2022 23:52

bedevere-bot added the awaiting core review label Oct 25, 2022

markshannon requested changes Oct 26, 2022

View reviewed changes

bedevere-bot removed the awaiting core review label Oct 26, 2022

bedevere-bot added the awaiting changes label Oct 26, 2022

bedevere-bot mentioned this pull request Nov 1, 2022

Quicken everything #98686

Open

brandtbucher added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Nov 1, 2022

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Nov 1, 2022

markshannon self-requested a review November 2, 2022 15:03

markshannon approved these changes Nov 2, 2022

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting changes labels Nov 2, 2022

brandtbucher merged commit 276d777 into python:main Nov 2, 2022

bedevere-bot removed the awaiting merge label Nov 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-98686: Quicken everything #98687

GH-98686: Quicken everything #98687

brandtbucher commented Oct 25, 2022 •

edited by bedevere-bot

Loading

brandtbucher commented Oct 25, 2022

markshannon commented Oct 26, 2022

markshannon left a comment

markshannon Oct 26, 2022

markshannon Oct 26, 2022

bedevere-bot commented Oct 26, 2022

brandtbucher commented Nov 1, 2022

bedevere-bot commented Nov 1, 2022

GH-98686: Quicken everything #98687

GH-98686: Quicken everything #98687

Conversation

brandtbucher commented Oct 25, 2022 • edited by bedevere-bot Loading

brandtbucher commented Oct 25, 2022

markshannon commented Oct 26, 2022

markshannon left a comment

Choose a reason for hiding this comment

markshannon Oct 26, 2022

Choose a reason for hiding this comment

markshannon Oct 26, 2022

Choose a reason for hiding this comment

bedevere-bot commented Oct 26, 2022

brandtbucher commented Nov 1, 2022

bedevere-bot commented Nov 1, 2022

brandtbucher commented Oct 25, 2022 •

edited by bedevere-bot

Loading