-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[openload] Generic IP v4 & v6 stream URL parsing (fixes #16137) #16250
Conversation
|
b3a081b
to
bbcfa1d
Compare
My bad, there are no warning. I was using a slightly deprecated version of flake8 (W504 was added 11 days ago, and is ignored by default on latest flake8).
I updated the regex a bit. While I understand you don't want to break the old behavior, please note that the new regexp is a super-set of the two others: new = r">\s*([\w~-]+?~(?:(?:[a-f0-9:]+)|(?:[0-9\.]+))~[\w~-]+)\s*<"
old = r">\s*([\w-]+~\d{10,}~\d+\.\d+\.0\.0~[\w-]+)\s*<"
# [\w-]+~ is included in [\w~-]+
# \d{10,} is included in [\w~-]+
# \d+\.\d+\.0\.0 is a 16-bit IPv4 network, included in [0-9\.]+
# [\w-]+ is included in [\w~-]+
# the rest is the same
old = r">\s*([\w~-]+~\d+\.\d+\.\d+\.\d+~[\w~-]+)"
# \d+\.\d+\.\d+\.\d+ is an IPv4 address, included in [0-9\.]+
# the rest is the same @dstftw, I hope I made a point. I understand your position as a maintainer, but I would advocate against accumulating knowledge debt: these regexps are clearly not needed and confusing for futur contributors. |
Again: these regexes are intentional, in decreasing degree of strictness. Do not touch them. |
bbcfa1d
to
fa69ec8
Compare
Fine, updated. This must be the dumbest patch I ever contributed to some open-source project. |
youtube_dl/extractor/openload.py
Outdated
'stream URL')) | ||
r'>\s*([\w~-]+~\d+\.\d+\.\d+\.\d+~[\w~-]+)', | ||
# someId~someIPv4orIPv6~someId | ||
r'>\s*([\w~-]+?~(?:(?:[a-f0-9:]+)|(?:[0-9\.]+))~[\w~-]+)\s*<'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v4 is already matched by the previous regex, this regex must only match v6.
fa69ec8
to
66029ad
Compare
Okay, so now we've got three regexes doing the exact same thing, with both very specific patterns for the middle part and very generic ones ( |
Nah. You did not even grasp the point. |
Apart from grasping some point, does the latest patch still require changes to meet your requirements? Now we can also have a constructive discussion about these regexps, but you might want to express your view better than telling contributors they are not capable of understanding whatever point you are trying to make. |
This is a fix for #16137. The previous regexps were doing painful parsing of what is actually an IP network. Thing is, the original author (and Travis CI!) has no IPv6 connectivity, hence forgot to include IPv6 patterns in regexps (hex and
:
eg.2a01:cb15::
). This breaks URL parsing for people on a IPv6-enabled network.This PR replaces the two similar regexps with a more generic one, that includes both IPv4 and IPv6.
flake8 reports warnings around my fix, dunno if I am supposed to fix these in the PR, as it is unrelated. Please tell me if that's a requirement.