-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Superoptimize sequences of instructions without observable side-effects in the bytecode compiler. #102869
Comments
See faster-cpython/ideas#567 for the original motivation for this. |
@markshannon why not hardcode these sequences into the peepholer? something like this: CCLDArjun@22b6aff Are there enough cases of these that calculating the stack effect and building a lookup table would be worthwhile? I would imagine finding the stack effect would involve calculating how the stack is before and after all of these operations, right? |
Just curious... how is this different from the existing The main idea there is that optimizing runs of |
3.10: >>> def f():
... a, b, c = b, c, a
...
>>> dis.dis(f)
2 0 LOAD_FAST 0 (b)
2 LOAD_FAST 1 (c)
4 LOAD_FAST 2 (a)
6 ROT_THREE
8 ROT_TWO
10 STORE_FAST 2 (a)
12 STORE_FAST 0 (b)
14 STORE_FAST 1 (c)
16 LOAD_CONST 0 (None)
18 RETURN_VALUE 3.11: >>> def f():
... a, b, c = b, c, a
...
>>> dis.dis(f)
1 0 RESUME 0
2 2 LOAD_FAST 0 (b)
4 LOAD_FAST 1 (c)
6 LOAD_FAST 2 (a)
8 STORE_FAST 1 (c)
10 STORE_FAST 0 (b)
12 STORE_FAST 2 (a)
14 LOAD_CONST 0 (None)
16 RETURN_VALUE |
I think if Because |
Discussion in faster-cpython/ideas#585 suggests that we can't actually move |
Sequences of
LOAD_FAST
,COPY
,LOAD_CONST
,SWAP
,POP_TOP
,NOP
(and other?) instructions have no observable side-effects as the evaluation stack cannot be observed.Therefore we can replace any sequence of these instructions with a shorter sequence that has the same effect.
By using a superoptimizer we replace sequences with the optimal equivalent sequence.
In practice we will want to limit the size of lookup tables we generate, but we can generate complete tables for sequences up to 4 or 5.
There is a slight complication of line numbers, but as long the number of line numbers does not exceed the optimal sequence length, we can just allocate instructions line numbers without regard to the original mapping of instructions to line numbers.
Generating and using tables.
During build (or offline), we can generate a table mapping sequences of instructions to the stack, recording the optimum sequence for each stack.
E.g. The sequences
LOAD_FAST x; LOAD_FAST y; SWAP 2
andLOAD_FAST y; LOAD_FAST x
generate the stack[ y, x ]
.So the lookup table maps
[ y, x ]
toLOAD_FAST y; LOAD_FAST x
.Whenever we see a sequence of instructions without observable side effects, we compute the stack, and look it up. If we find a match, emit the optimal sequence.
Mimimizing the size of tables
We will need to use some sort of numbering scheme for the indexes of
LOAD_FAST
andLOAD_CONST
, and treatLOAD_CONST
the same asLOAD_FAST
.So that
LOAD_FAST a; LOAD_CONST 1.0
becomesLOAD 1; LOAD 2
.In order to keep the table size down, we should limit the number of distinct locals and constants to 4, and the depth of the stack to something like 7. That way there are only a few thousand possible stacks, and the optimal sequence for each cannot be more than 4 instructions. The entire table should then fit into tens of kbytes of
const
data.Handling line numbers.
Since the instructions have no visible side-effects, we can place the line events anywhere in the instruction sequence.
Taking the above sequence with line numbers:
We can generate:
which is valid as it produces the same sequence of line events and the observable state of the VM is the same in both cases.
Linked PRs
The text was updated successfully, but these errors were encountered: