-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[generic] prefer enclosures over following links in rss feeds #16189
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test.
youtube_dl/extractor/generic.py
Outdated
break | ||
|
||
if not enclosure_nodes: | ||
next_url = xpath_text(it, 'link', fatal=False) | ||
|
||
if not next_url: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential reference before assignment error.
youtube_dl/extractor/generic.py
Outdated
if next_url: | ||
break | ||
|
||
if not enclosure_nodes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not mean next_url
is obtained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check was changed and an assignment was added, so that the error can't occur anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing changed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a change needed here? If not enclosure_nodes
is True
it is expected that next_url
wasn't obtained yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what? If not enclosure_nodes
is False
it does not mean next_url
was obtained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some thought, I think I get what you are saying. If the rss feed has enclosure_nodes
, but the url is empty it skips even though it shouldn't. This is fixed now.
youtube_dl/extractor/generic.py
Outdated
if not enclosure_nodes: | ||
next_url = xpath_text(it, 'link', fatal=False) | ||
|
||
if next_url is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't skip empty strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i changed it back to not next_url
. This should do the job in conjunction with the next_url = None
from above.
All open points were addressed (fixed). Is there anything needed from my side before this is merged? |
|
The test was added. This is what it look like with the current youtube-dl version:
Here is what it says with changes introduced in this pull request:
|
Please follow the guide below
x
into all the boxes [ ] relevant to your pull request (like that [x])Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
When downloading from rss feeds, the generic extractor follows the rss links first. This leads to "ERROR: Unsupported URL" messages in a lot of cases. Even though there are enclosures present.
With this pull request downloading enclosures is preferred over following the rss links.