-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[motherless] Add support for groups #15124
Conversation
youtube_dl/extractor/motherless.py
Outdated
|
||
|
||
class MotherlessGroupIE(InfoExtractor): | ||
_VALID_URL = r'https?://(?:www\.)?motherless\.com/gv?/(?P<id>[a-z0-9_]+)/?$' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/?$
senseless. Does not match http://motherless.com/gv/sex_must_be_funny?foo=bar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may not match http://motherless.com/g/cosplay/633979F
and I'm a bit at a loss on how to accomplish that.
youtube_dl/extractor/motherless.py
Outdated
webpage = self._download_webpage( | ||
'http://motherless.com/gv/%s' % group_id, group_id) | ||
title = self._search_regex( | ||
r'<title>([\w\s]+)', webpage, 'title', fatal=True).strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Playlist title should not be fatal.
youtube_dl/extractor/motherless.py
Outdated
r'<meta name="description" content="([^"]+)">', | ||
webpage, 'description', fatal=False) | ||
page_count = self._int(self._search_regex( | ||
r'(\d+)</(a|span)><(a|span)[^>]+>\s*NEXT', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't capture groups you don't use.
youtube_dl/extractor/motherless.py
Outdated
) | ||
webpage = self._download_webpage( | ||
page_url, group_id, | ||
note="Downloding page %d/%d" % (idx + 1, page_count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single quotes.
youtube_dl/extractor/motherless.py
Outdated
'info_dict': { | ||
'id': 'movie_scenes', | ||
'title': 'Movie Scenes', | ||
'description': 'Hot and sexy scenes from "regular" ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not unescaped.
issues addressed. |
youtube_dl/extractor/motherless.py
Outdated
webpage = self._download_webpage( | ||
'http://motherless.com/gv/%s' % group_id, group_id) | ||
title = self._search_regex( | ||
r'<title>([\w\s]+)', webpage, 'title', fatal=False).strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaks on None
title.
youtube_dl/extractor/motherless.py
Outdated
|
||
|
||
class MotherlessGroupIE(InfoExtractor): | ||
_VALID_URL = r'https?://(?:www\.)?motherless\.com/gv?/(?P<id>[a-z0-9_]+)(?:$|/[^A-F0-9]|/?\?)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overload suitable
instead.
youtube_dl/extractor/motherless.py
Outdated
def _real_extract(self, url): | ||
group_id = self._match_id(url) | ||
webpage = self._download_webpage( | ||
'http://motherless.com/gv/%s' % group_id, group_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the original scheme and host.
youtube_dl/extractor/motherless.py
Outdated
def _extract_entries(self, webpage): | ||
return [ | ||
self.url_result( | ||
'http://www.motherless.com/%s' % video_url, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the original scheme and host.
youtube_dl/extractor/motherless.py
Outdated
page_size = 80 | ||
|
||
def _get_page(idx): | ||
page_url = 'http://motherless.com/gv/%s?page=%d' % ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query to query
,
Anything else? |
youtube_dl/extractor/motherless.py
Outdated
|
||
group_id = self._match_id(url) | ||
webpage = self._download_webpage( | ||
'%s://%s/gv/%s' % (http_scheme, http_host, group_id), group_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urljoin.
youtube_dl/extractor/motherless.py
Outdated
page_size = 80 | ||
|
||
def _get_page(idx): | ||
page_url = '%s://%s/gv/%s' % ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urljoin.
youtube_dl/extractor/motherless.py
Outdated
def _extract_entries(self, webpage, http_scheme, http_host): | ||
return [ | ||
self.url_result( | ||
'%s://%s/%s' % (http_scheme, http_host, video_url), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urljoin.
youtube_dl/extractor/motherless.py
Outdated
webpage, 'description', fatal=False)) | ||
page_count = self._int(self._search_regex( | ||
r'(\d+)</(?:a|span)><(?:a|span)[^>]+>\s*NEXT', | ||
webpage, 'page_count'), 'page_count', fatal=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fatal=True is default.
youtube_dl/extractor/motherless.py
Outdated
) | ||
webpage = self._download_webpage( | ||
page_url, group_id, query={'page': idx + 1}, | ||
note='Downloding page %d/%d' % (idx + 1, page_count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling.
youtube_dl/extractor/motherless.py
Outdated
page_count = self._int(self._search_regex( | ||
r'(\d+)</(?:a|span)><(?:a|span)[^>]+>\s*NEXT', | ||
webpage, 'page_count'), 'page_count', fatal=True) | ||
page_size = 80 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uppercase.
youtube_dl/extractor/motherless.py
Outdated
title = self._search_regex( | ||
r'<title>([\w\s]+\w)\s+-', webpage, 'title', fatal=False) | ||
description = unescapeHTML(self._search_regex( | ||
r'<meta name="description" content="([^"]+)">', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_html_search_meta.
Addressed, what's next? 😆 |
youtube_dl/extractor/motherless.py
Outdated
|
||
def _real_extract(self, url): | ||
parsed_url = compat_urlparse.urlparse(url) | ||
base_url = '%s://%s' % (parsed_url.scheme, parsed_url.netloc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of this? Use url
as base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah neat, I was unaware urljoin allowed overwriting of the path.
youtube_dl/extractor/motherless.py
Outdated
PAGE_SIZE = 80 | ||
|
||
def _get_page(idx): | ||
page_url = compat_urlparse.urljoin(base_url, '/gv/%s' % group_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code duplication 165, 176.
Updated and squashed. |
Thanks for the thorough review! 👍 |
Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
Add support for group URLs to be downloaded as playlists.