[motherless] Add support for groups #15124

mweinelt · 2017-12-30T23:12:56Z

Before submitting a pull request make sure you have:

At least skimmed through adding new extractor tutorial and youtube-dl coding conventions sections
Searched the bugtracker for similar pull requests
Checked the code with flake8

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Bug fix
Improvement
New extractor
New feature

Description of your pull request and other information

Add support for group URLs to be downloaded as playlists.

dstftw · 2017-12-31T09:32:24Z

youtube_dl/extractor/motherless.py

+
+
+class MotherlessGroupIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?motherless\.com/gv?/(?P<id>[a-z0-9_]+)/?$'


/?$ senseless. Does not match http://motherless.com/gv/sex_must_be_funny?foo=bar.

~~It may not match http://motherless.com/g/cosplay/633979F and I'm a bit at a loss on how to accomplish that.~~

dstftw · 2017-12-31T09:32:51Z

youtube_dl/extractor/motherless.py

+        webpage = self._download_webpage(
+            'http://motherless.com/gv/%s' % group_id, group_id)
+        title = self._search_regex(
+            r'<title>([\w\s]+)', webpage, 'title', fatal=True).strip()


Playlist title should not be fatal.

dstftw · 2017-12-31T09:33:08Z

youtube_dl/extractor/motherless.py

+            r'<meta name="description" content="([^"]+)">',
+            webpage, 'description', fatal=False)
+        page_count = self._int(self._search_regex(
+            r'(\d+)</(a|span)><(a|span)[^>]+>\s*NEXT',


Don't capture groups you don't use.

dstftw · 2017-12-31T09:33:28Z

youtube_dl/extractor/motherless.py

+            )
+            webpage = self._download_webpage(
+                page_url, group_id,
+                note="Downloding page %d/%d" % (idx + 1, page_count)


Single quotes.

dstftw · 2017-12-31T09:35:00Z

youtube_dl/extractor/motherless.py

+        'info_dict': {
+            'id': 'movie_scenes',
+            'title': 'Movie Scenes',
+            'description': 'Hot and sexy scenes from &quot;regular&quot; '


Not unescaped.

mweinelt · 2017-12-31T11:43:56Z

issues addressed.

dstftw · 2018-01-01T11:11:43Z

youtube_dl/extractor/motherless.py

+        webpage = self._download_webpage(
+            'http://motherless.com/gv/%s' % group_id, group_id)
+        title = self._search_regex(
+            r'<title>([\w\s]+)', webpage, 'title', fatal=False).strip()


Breaks on None title.

dstftw · 2018-01-01T11:17:24Z

youtube_dl/extractor/motherless.py

+
+
+class MotherlessGroupIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?motherless\.com/gv?/(?P<id>[a-z0-9_]+)(?:$|/[^A-F0-9]|/?\?)'


Overload suitable instead.

dstftw · 2018-01-01T11:28:26Z

youtube_dl/extractor/motherless.py

+    def _real_extract(self, url):
+        group_id = self._match_id(url)
+        webpage = self._download_webpage(
+            'http://motherless.com/gv/%s' % group_id, group_id)


Use the original scheme and host.

dstftw · 2018-01-01T11:29:04Z

youtube_dl/extractor/motherless.py

+    def _extract_entries(self, webpage):
+        return [
+            self.url_result(
+                'http://www.motherless.com/%s' % video_url,


Use the original scheme and host.

dstftw · 2018-01-01T11:29:49Z

youtube_dl/extractor/motherless.py

+        page_size = 80
+
+        def _get_page(idx):
+            page_url = 'http://motherless.com/gv/%s?page=%d' % (


Query to query,

mweinelt · 2018-01-05T20:08:27Z

Anything else?

dstftw · 2018-01-06T13:41:47Z

youtube_dl/extractor/motherless.py

+
+        group_id = self._match_id(url)
+        webpage = self._download_webpage(
+            '%s://%s/gv/%s' % (http_scheme, http_host, group_id), group_id)


dstftw · 2018-01-06T13:41:54Z

youtube_dl/extractor/motherless.py

+        page_size = 80
+
+        def _get_page(idx):
+            page_url = '%s://%s/gv/%s' % (


dstftw · 2018-01-06T13:42:02Z

youtube_dl/extractor/motherless.py

+    def _extract_entries(self, webpage, http_scheme, http_host):
+        return [
+            self.url_result(
+                '%s://%s/%s' % (http_scheme, http_host, video_url),


dstftw · 2018-01-06T13:42:22Z

youtube_dl/extractor/motherless.py

+            webpage, 'description', fatal=False))
+        page_count = self._int(self._search_regex(
+            r'(\d+)</(?:a|span)><(?:a|span)[^>]+>\s*NEXT',
+            webpage, 'page_count'), 'page_count', fatal=True)


fatal=True is default.

dstftw · 2018-01-06T13:42:51Z

youtube_dl/extractor/motherless.py

+            )
+            webpage = self._download_webpage(
+                page_url, group_id, query={'page': idx + 1},
+                note='Downloding page %d/%d' % (idx + 1, page_count)


dstftw · 2018-01-06T13:43:46Z

youtube_dl/extractor/motherless.py

+        page_count = self._int(self._search_regex(
+            r'(\d+)</(?:a|span)><(?:a|span)[^>]+>\s*NEXT',
+            webpage, 'page_count'), 'page_count', fatal=True)
+        page_size = 80


dstftw · 2018-01-06T13:44:35Z

youtube_dl/extractor/motherless.py

+        title = self._search_regex(
+            r'<title>([\w\s]+\w)\s+-', webpage, 'title', fatal=False)
+        description = unescapeHTML(self._search_regex(
+            r'<meta name="description" content="([^"]+)">',


_html_search_meta.

mweinelt · 2018-01-06T16:12:56Z

Addressed, what's next? 😆

dstftw · 2018-01-06T16:15:26Z

youtube_dl/extractor/motherless.py

+
+    def _real_extract(self, url):
+        parsed_url = compat_urlparse.urlparse(url)
+        base_url = '%s://%s' % (parsed_url.scheme, parsed_url.netloc)


What's the point of this? Use url as base.

Ah neat, I was unaware urljoin allowed overwriting of the path.

dstftw · 2018-01-06T16:16:07Z

youtube_dl/extractor/motherless.py

+        PAGE_SIZE = 80
+
+        def _get_page(idx):
+            page_url = compat_urlparse.urljoin(base_url, '/gv/%s' % group_id)


Code duplication 165, 176.

mweinelt · 2018-01-06T16:31:46Z

Updated and squashed.

mweinelt · 2018-01-06T16:34:05Z

Thanks for the thorough review! 👍

dstftw requested changes Dec 31, 2017

View reviewed changes

dstftw added the pending-fixes label Dec 31, 2017

dstftw requested changes Jan 1, 2018

View reviewed changes

dstftw requested changes Jan 6, 2018

View reviewed changes

[motherless] Add support for groups

373011c

dstftw merged commit 45283af into ytdl-org:master Jan 6, 2018

mweinelt deleted the ml_group branch January 6, 2018 16:34

dstftw added a commit that referenced this pull request Feb 9, 2018

Credit @mweinelt for #15124

042968f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[motherless] Add support for groups #15124

[motherless] Add support for groups #15124

mweinelt commented Dec 30, 2017

dstftw Dec 31, 2017

mweinelt Dec 31, 2017 •

edited

Loading

dstftw Dec 31, 2017

dstftw Dec 31, 2017

dstftw Dec 31, 2017

dstftw Dec 31, 2017

mweinelt commented Dec 31, 2017

dstftw Jan 1, 2018

dstftw Jan 1, 2018

dstftw Jan 1, 2018

dstftw Jan 1, 2018

dstftw Jan 1, 2018

mweinelt commented Jan 5, 2018

dstftw Jan 6, 2018

dstftw Jan 6, 2018

dstftw Jan 6, 2018

dstftw Jan 6, 2018

dstftw Jan 6, 2018

dstftw Jan 6, 2018

dstftw Jan 6, 2018

mweinelt commented Jan 6, 2018

dstftw Jan 6, 2018

mweinelt Jan 6, 2018

dstftw Jan 6, 2018

mweinelt commented Jan 6, 2018

mweinelt commented Jan 6, 2018



		class MotherlessGroupIE(InfoExtractor):
		_VALID_URL = r'https?://(?:www\.)?motherless\.com/gv?/(?P<id>[a-z0-9_]+)/?$'

[motherless] Add support for groups #15124

[motherless] Add support for groups #15124

Conversation

mweinelt commented Dec 30, 2017

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

What is the purpose of your pull request?

Description of your pull request and other information

Choose a reason for hiding this comment

mweinelt Dec 31, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mweinelt commented Dec 31, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mweinelt commented Jan 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mweinelt commented Jan 6, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mweinelt commented Jan 6, 2018

mweinelt commented Jan 6, 2018

mweinelt Dec 31, 2017 •

edited

Loading