-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
re.sub confusion between count and flags args #56166
Comments
re.sub don't substitute not ASCII characters: Python 2.7.1 (r271:86832, Apr 15 2011, 12:11:58) Arch Linux
|
The 4th parameter to re.sub() is a count, not flags. |
Since this has been reported already several times (see e.g. bpo-11947), and it's a fairly common mistake, I think we should do something to avoid it. A few possibilities are:
|
I like the idea of an internal REflag class with __new__, __or__, and __repr__==str. Str(re.A|re.L) might print as |
Something like "<re.Flag ASCII | IGNORE>" may be more Pythonic. |
Agreed, if we go that route. |
I’d favor 1) or 2) over 3). Ints are short and very commonly used for flags. |
See also bpo-12888 for an error in the stdlib caused by this. |
I like option #2, and I was thinking of working on it today, poke me if anyone has a problem with this. |
There is no sane way to issue a warning without changing the signature and we don't want to change the signature without issuing a deprecation warning for the function, so sadly option 3 is the only way for this to work, (Im going to not touch this till ENUMS are merged in.) |
Can't you use *args and **kwargs and then raise a deprecation warning if count and/or flags are in args? |
We could do that but we would be changing the signature before adding the warning |
The change would still be backwards compatible (even though inspect.signature and similar functions might return something different). Note that I'm not saying that's the best option, but it should be doable. |
Please see my patch, I have changed flags to be instances of IntEnum and added a check to re.sub, re.subn and re.split. The patch contains some tests. This solution also allowed me to discover several bugs in the standard library, and I am going to create tickets for them shortly. |
New changeset 767fd62b59a9 by Victor Stinner in branch 'default': |
I suggest to make the 2 last parameters of re.sub(), re.subn() and re.split() parameters as keyword-only. It will break applications using count and maxsplit parameters as index parameters, but it's easy to fix these applications if they want to support also Python 3.5. I checked Python 2.6: the name of the maxsplit and count parameters didn't change. So it's possible to write code working on Python 2.6-3.5 if the parameter name is explicitly used:
The flags parameter was added to re.sub(), re.subn() and re.split() functions in Python 2.7:
See my attached re_keyword_only.patch:
|
Thank you for your patch Valentina. But it makes flags combinations not pickleable. >>> import re, pickle
>>> pickle.dumps(re.I|re.S, 3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
_pickle.PicklingError: Can't pickle <enum 'SubFlag'>: attribute lookup SubFlag on sre_constants failed
>>> pickle.dumps(re.I|re.S, 4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
_pickle.PicklingError: Can't pickle <enum 'SubFlag'>: attribute lookup BaseFlags.__or__.<locals>.SubFlag on sre_constants failed And I'm afraid that creating new class in the "|" operator can affect performance. |
As for 767fd62b59a9, I doubt that changing positional arguments to keyword argumennts in tests is justified. This can hide a bug. |
And again about patch_11957. I afraid that testing isinstance(count, sre_constants.BaseFlags) on every re.sub() call will hit performance too. |
I agree about 767fd62b59a9, there should be tests for args passed both by position and as keyword args. Serhiy, do you think the enum solution is worth pursuing, or is it better to just turn those args to keyword-only (after a proper deprecation process)? |
I think that the enum solution is worth pursuing, and that we need general |
New changeset 216e8b809e4e by Serhiy Storchaka in branch '3.5': New changeset b39b09290718 by Serhiy Storchaka in branch '3.6': New changeset da2c96cf2ce6 by Serhiy Storchaka in branch 'default': |
Here are two alternative patches. The first patch checks if count or maxsplit arguments are re.RegexFlag and raise TypeError if it is true. This makes misusing flags fail fast. The second patch deprecates passing count and maxsplit arguments as positional arguments. This imposes your to change your code (even if it is valid now) and makes hard misusing flags. Unfortunately both ways slow down calling functions. $ ./python -m perf timeit -s "import re" -- 're.split(":", ":a:b::c", 2)' unpatched: Median +- std dev: 2.73 us +- 0.09 us $ ./python -m perf timeit -s "import re" -- 're.split(":", ":a:b::c", maxsplit=2)' unpatched: Median +- std dev: 2.78 us +- 0.07 us |
My issue bpo-30072 has been marked as a duplicate of this one. Copy of my msg291650: The re API seems commonly misused. Example passing a re flag to re.sub(): >>> re.sub("A", "B", "ahah", re.I)
'ahah' No error, no warning, but it doesn't work. Oh, sub has 5 paramters, no 4... I suggest to convert count and flags to keyword-only parameters. To not break the world, especially legit code passing the count parameter as a position argument, an option is to have a deprecation period if these two parameters are passed a positional-only parameter. -- Another option would be to rely on the fact that re flags are now enums instead of raw integers, and so add basic type check... Is there are risk of applications using re flags serialized by pickle from Pyhon < 3.6 and so getting integers? Maybe the check should only be done if flags are passing as positional-only argument... but the implementation of such check seems may be overkill for such simple and performance-critical function, no? See issue bpo-30067 for a recent bug in the Python stdlib! |
Victor, I borrowed Guido's time machine and wrote patches implementing both your suggestions a half year ago. |
+ raise TypeError("sub() takes from 2 to 4 positional arguments " It's actually 3 to 5 for sub() and subn(). |
I'd like to propose that we merge Serhiy's second patch, which deprecates passing count and maxsplit arguments as positional arguments.
If necessary, I'd be happy to write or organize a PR. |
Thanks for bumping this, I'd forgotten about it. As you can tell by my previous comments, I think this is something we need to do. Just fixed another one of these a couple weeks ago. I'll open a PR with Serhiy's patch |
…e functions Deprecate passing optional arguments maxsplit, count and flags in module-level functions re.split(), re.sub() and re.subn() as positional. They should only be passed by keyword.
It is an old patch and it required polishing. |
…e functions Deprecate passing optional arguments maxsplit, count and flags in module-level functions re.split(), re.sub() and re.subn() as positional. They should only be passed by keyword.
…tions (#107778) Deprecate passing optional arguments maxsplit, count and flags in module-level functions re.split(), re.sub() and re.subn() as positional. They should only be passed by keyword.
since Python 3.13, passing count to `re.sub()` as positional argument has been deprecated. and when runnint `test.py` with Python 3.13, we have following warning: ``` /home/kefu/dev/scylladb/./test.py:1477: DeprecationWarning: 'count' is passed as positional argument args.tests = set(re.sub(r'.* List configured unit tests\n(.*)\n', r'\1', out, 1, re.DOTALL).split("\n")) ``` see also python/cpython#56166 in order to silence this distracting warning, let's pass `count` using kwarg. Signed-off-by: Kefu Chai <[email protected]>
since Python 3.13, passing count to `re.sub()` as positional argument has been deprecated. and when runnint `test.py` with Python 3.13, we have following warning: ``` /home/kefu/dev/scylladb/./test.py:1477: DeprecationWarning: 'count' is passed as positional argument args.tests = set(re.sub(r'.* List configured unit tests\n(.*)\n', r'\1', out, 1, re.DOTALL).split("\n")) ``` see also python/cpython#56166 in order to silence this distracting warning, let's pass `count` using kwarg. Signed-off-by: Kefu Chai <[email protected]> Closes #20859
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: