Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mypyc] Add match statement support #13953

Merged
merged 102 commits into from
Dec 2, 2022
Merged

[mypyc] Add match statement support #13953

merged 102 commits into from
Dec 2, 2022

Conversation

dosisod
Copy link
Contributor

@dosisod dosisod commented Oct 27, 2022

Closes mypyc/mypyc#911

Like the title says, this PR adds support for compiling match statements in mypyc. Most of the work has been done, but there are some things which are still a WIP, though I do not know how to fix them (see below).

A todo list of what has been done, and the (small) number of things that need to be worked out:

  • Or patterns: 1 | 2 | 3
  • Value patterns: 123, x.y.z, etc.
  • Singleton patterns: True, False, and None
  • Sequence patterns:
    • Fixed length patterns [1, 2, 3]
    • Starred patterns [*prev, 4, 5, 6], [1, 2, 3, *rest], etc:
      • [*rest] is currently not working, but should be an easy fix
    • Support any object which supports the Sequence Protocol (need help with this)
  • Mapping Pattern ({"key": value}):
    • General support
    • Starred patterns: {"key": value, **rest}
    • Support any object which supports the Mapping Protocol (need help with this)
  • Class patterns:
    • Basic class isinstance() check
    • Positional args: Class(1, 2, 3)
    • Keyword args: Class(x=1, y=2, z=3)
    • Shortcut for built-in datatypes: int(x) -> int() as x
  • Capture patterns:
    • Wildcard pattern: _
    • As pattern: 123 as num
    • Capture pattern: x

Some features which I am unsure how to implement are:

  • Fix *rest and **rest star patterns name collisions. Basically, you cannot use rest (or any other name) twice in the same match statement if rest is a different type (ie, dict vs list). If it was defined as object instead of dict/list everything would be fine.
  • PEP-634 says that a TypeError should be thrown if __match_args__ is not a tuple, or is otherwise invalid (there are other instances where TypeError should be thrown as well). In general though, should we follow the spec and move these errors to run-time? Or, should we keep it as it is, which is an assert statement at compile time?
  • As for the Sequence/Mapping protocol support, I tried using PySequence_Check and PyMapping_Check, though the Mapping check function returned true for str types, which is not ideal. If anyone has a better (more reliable) way of checking whether a type supports the Sequence/Mapping protocols, let me know.

Also, sorry for the sheer number of commits! I was going to rebase this before I submitted it, but decided that would cut out too much of my thought process.

mypyc/build.py Outdated Show resolved Hide resolved
@sobolevn
Copy link
Member

@brandtbucher can you please take a look? :)
You are the original author.

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 31, 2022

Looking at Python/ceval.c in CPython 3.11.0, checking for a sequence is implemented like this:

            int match = Py_TYPE(subject)->tp_flags & Py_TPFLAGS_SEQUENCE;
            PyObject *res = match ? Py_True : Py_False;

The mapping check is similar:

            int match = Py_TYPE(subject)->tp_flags & Py_TPFLAGS_MAPPING;
            PyObject *res = match ? Py_True : Py_False;

PySequence_Check uses a different implementation and I suspect that it doesn't match the above in every case. Matching CPython behavior seems like the safest bet.

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 31, 2022

PEP-634 says that a TypeError should be thrown if __match_args__ is not a tuple, or is otherwise invalid (there are other instances where TypeError should be thrown as well). In general though, should we follow the spec and move these errors to run-time? Or, should we keep it as it is, which is an assert statement at compile time?

We shouldn't assert in the compiler, if it's possible the assertion to fail. Instead, we can generate report an error from mypyc, or guard against the error in mypy. It mypy already generates an error if __match_args__ isn't a tuple, then the assertion is fine. (You can assume that nobody uses # type: ignore for mypy errors -- we'll need to disallow most of these at some point.)

In general, it's preferable to move checks that happen at runtime in CPython to happen during compilation, since it's usually more efficient, and we'll get more compile-time error reporting, which is also nice.

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 31, 2022

Fix *rest and **rest star patterns name collisions. Basically, you cannot use rest (or any other name) twice in the same match statement if rest is a different type (ie, dict vs list). If it was defined as object instead of dict/list everything would be fine.

Hmm mypy should already infer a union type in this case:

def f(x: object) -> None:
    match x:
        case [*rest]:
            y = 1
        case {**rest}:
            y = 2
    reveal_type(rest)  # Union[builtins.list[builtins.object], builtins.dict[builtins.object, builtins.object]]

What kinds of problems are you having with this?

These lines will be moved to a separate PR so that they can be reviewed
separately from the `match` stuff going on here.
@dosisod
Copy link
Contributor Author

dosisod commented Oct 31, 2022

Ok I will start changing the assertions into error messages then. Indeed, having actual error messages will be a lot better then a traceback.

About the rest issue: What is happening is that the first rest is created as a dict type, and the second rest reuses the existing one, which also happens to be a dict.

The following test:

[case testing_python3_10]
def f(x):
    match x:
        case {**rest}:
            print(rest, type(rest))

        case [*rest]:
            print(rest, type(rest))
[file driver.py]
from native import f

f({"hello", "world"})
f([1, 2, 3])

Fails with the following error:

Expected:
Actual:
  Traceback (most recent call last):            (diff)
    File "driver.py", line 4, in <module>       (diff)
      f([1, 2, 3])                              (diff)
    File "native.py", line -1, in f             (diff)
  TypeError: dict object expected; got list     (diff)

I don't know if there is a way to express union types in the IR, but if you could just use object I think everything would work out. I couldn't figure out how to explicitly type the assignment as an object instead of a dict.

Since the `Py_TPFLAGS_MAPPING` and `Py_TPFLAGS_SEQUENCE` are specific to the
new pattern matching feature, they are not defined for Python 3.9 and below.
Now they are defined if they don't exist, which should fix the issue.
dosisod added a commit to dosisod/mypy that referenced this pull request Nov 1, 2022
@dosisod
Copy link
Contributor Author

dosisod commented Nov 2, 2022

@JukkaL , as far as I can tell, Mypy already checks that the class is able to be matched, and that __match_args__ is valid. Running the following through mypy:

class A:
    __match_args__ = None

class B:
    num: int

    __match_args__ = (123,)

something: str

class C:
    num: int

    __match_args__ = (something,)

class D:
    x: int

def f(x):
    match x:
        case D(123):
            pass

        case len(123):
            pass

Produces the following:

file.py:2: note: __match_args__ must be a tuple containing string literals for checking of match statements to work
file.py:7: note: __match_args__ must be a tuple containing string literals for checking of match statements to work
file.py:14: note: __match_args__ must be a tuple containing string literals for checking of match statements to work
file.py:21: error: Class "x.D" doesn't define "__match_args__"  [misc]
file.py:24: error: Expected type in class pattern; found "len"  [misc]

Meaning we never even get to the mypyc stage. With that in mind, I think that it is perfectly fine to leave the assertions as-is, given that they don't seem to be able to be hit.

JukkaL pushed a commit that referenced this pull request Nov 3, 2022
It would appear that 9125155 is now causing issues with the IR tests.
Reverting that commit seems to fix it, but I cannot seem to remember why I
added it in the first place, and the build artifacts have long since expired.
Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Left some comments -- after you've addressed them (possibly by adding TODOs) this is ready to merge.

And sorry for the slow review! I've been focused on the mypy 1.0 release and was traveling recently.

mypyc/test-data/run-match.test Outdated Show resolved Hide resolved
mypyc/test-data/run-match.test Outdated Show resolved Hide resolved
mypyc/test-data/irbuild-match.test Show resolved Hide resolved
slow_isinstance_op,
[self.subject, self.builder.accept(pattern.class_ref)],
pattern.line,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we could possibly use a more efficient isinstance op if we are dealing with native classes or primitive types. Can you at least add a TODO comment about this (this can be improved in a follow-up PR)?

mypyc/irbuild/match.py Show resolved Hide resolved
mypyc/irbuild/match.py Show resolved Hide resolved
* Use faster "isinstance" op if the expression is a built-in type

* Add type annotations to run tests

* Add type annotated IR example

* Add todo comments
@dosisod
Copy link
Contributor Author

dosisod commented Dec 1, 2022

Thank you for taking the time to review this! I know you guys have been very busy with the v1 release, so I completely understand the delay.

The optimizations for the isinstance and get_attr op codes are harder then I anticipated. I did update the isinstance check to apply to built-in types though, which should be a good starting point.

Also, the issue where captured expressions with the same name (ie, *rest and **rest) causing type errors has not been fixed yet (see above comment). Will this still be able to be merged with that not being fixed? I've poked around, and I can't seem to figure it out on my own.

Thanks again!

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This good to merge now! Thanks once again for working on this -- this is an important feature.

I may have time to look at the remaining TODOs, or perhaps another contributor can work on them. This is already a mostly complete implementation.

@JukkaL JukkaL merged commit d5e96e3 into python:master Dec 2, 2022
@dosisod dosisod deleted the mypyc-match branch December 5, 2022 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compiling code with match statements fails with an exception
5 participants