Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[motherless] Add support for groups #15124

Merged
merged 1 commit into from
Jan 6, 2018
Merged

[motherless] Add support for groups #15124

merged 1 commit into from
Jan 6, 2018

Conversation

mweinelt
Copy link
Contributor

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

Add support for group URLs to be downloaded as playlists.



class MotherlessGroupIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?motherless\.com/gv?/(?P<id>[a-z0-9_]+)/?$'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@mweinelt mweinelt Dec 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not match http://motherless.com/g/cosplay/633979F and I'm a bit at a loss on how to accomplish that.

webpage = self._download_webpage(
'http://motherless.com/gv/%s' % group_id, group_id)
title = self._search_regex(
r'<title>([\w\s]+)', webpage, 'title', fatal=True).strip()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Playlist title should not be fatal.

r'<meta name="description" content="([^"]+)">',
webpage, 'description', fatal=False)
page_count = self._int(self._search_regex(
r'(\d+)</(a|span)><(a|span)[^>]+>\s*NEXT',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't capture groups you don't use.

)
webpage = self._download_webpage(
page_url, group_id,
note="Downloding page %d/%d" % (idx + 1, page_count)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single quotes.

'info_dict': {
'id': 'movie_scenes',
'title': 'Movie Scenes',
'description': 'Hot and sexy scenes from &quot;regular&quot; '
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not unescaped.

@mweinelt
Copy link
Contributor Author

issues addressed.

webpage = self._download_webpage(
'http://motherless.com/gv/%s' % group_id, group_id)
title = self._search_regex(
r'<title>([\w\s]+)', webpage, 'title', fatal=False).strip()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaks on None title.



class MotherlessGroupIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?motherless\.com/gv?/(?P<id>[a-z0-9_]+)(?:$|/[^A-F0-9]|/?\?)'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overload suitable instead.

def _real_extract(self, url):
group_id = self._match_id(url)
webpage = self._download_webpage(
'http://motherless.com/gv/%s' % group_id, group_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the original scheme and host.

def _extract_entries(self, webpage):
return [
self.url_result(
'http://www.motherless.com/%s' % video_url,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the original scheme and host.

page_size = 80

def _get_page(idx):
page_url = 'http://motherless.com/gv/%s?page=%d' % (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query to query,

@mweinelt
Copy link
Contributor Author

mweinelt commented Jan 5, 2018

Anything else?


group_id = self._match_id(url)
webpage = self._download_webpage(
'%s://%s/gv/%s' % (http_scheme, http_host, group_id), group_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urljoin.

page_size = 80

def _get_page(idx):
page_url = '%s://%s/gv/%s' % (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urljoin.

def _extract_entries(self, webpage, http_scheme, http_host):
return [
self.url_result(
'%s://%s/%s' % (http_scheme, http_host, video_url),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urljoin.

webpage, 'description', fatal=False))
page_count = self._int(self._search_regex(
r'(\d+)</(?:a|span)><(?:a|span)[^>]+>\s*NEXT',
webpage, 'page_count'), 'page_count', fatal=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fatal=True is default.

)
webpage = self._download_webpage(
page_url, group_id, query={'page': idx + 1},
note='Downloding page %d/%d' % (idx + 1, page_count)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling.

page_count = self._int(self._search_regex(
r'(\d+)</(?:a|span)><(?:a|span)[^>]+>\s*NEXT',
webpage, 'page_count'), 'page_count', fatal=True)
page_size = 80
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uppercase.

title = self._search_regex(
r'<title>([\w\s]+\w)\s+-', webpage, 'title', fatal=False)
description = unescapeHTML(self._search_regex(
r'<meta name="description" content="([^"]+)">',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_html_search_meta.

@mweinelt
Copy link
Contributor Author

mweinelt commented Jan 6, 2018

Addressed, what's next? 😆


def _real_extract(self, url):
parsed_url = compat_urlparse.urlparse(url)
base_url = '%s://%s' % (parsed_url.scheme, parsed_url.netloc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this? Use url as base.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah neat, I was unaware urljoin allowed overwriting of the path.

PAGE_SIZE = 80

def _get_page(idx):
page_url = compat_urlparse.urljoin(base_url, '/gv/%s' % group_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code duplication 165, 176.

@mweinelt
Copy link
Contributor Author

mweinelt commented Jan 6, 2018

Updated and squashed.

@dstftw dstftw merged commit 45283af into ytdl-org:master Jan 6, 2018
@mweinelt
Copy link
Contributor Author

mweinelt commented Jan 6, 2018

Thanks for the thorough review! 👍

@mweinelt mweinelt deleted the ml_group branch January 6, 2018 16:34
dstftw added a commit that referenced this pull request Feb 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants