Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vtv] Add new extractor #15462

Closed
wants to merge 2 commits into from
Closed

[vtv] Add new extractor #15462

wants to merge 2 commits into from

Conversation

tmsbrg
Copy link
Contributor

@tmsbrg tmsbrg commented Jan 31, 2018

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

VTV is the national television broadcaster of Vietnam. Their website hosts livestreams and videos from numerous state-owned television channels. They apparently broadcast worldwide. At least I can see the streams here in the Netherlands. This extractor supports audio and video livestreams as well as static videos.

video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)

title = self._html_search_regex(r'<title>(.+?)</title>', webpage, 'title', fatal=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default=None.

'info_dict': {
'id': '1014',
'ext': 'm4a',
'title': r're:VOV1 | LiveTV - TV Net .*',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any unrelated suffixes should be removed.


thumbnail = mediaplayer_div_attributes.get("data-image")

json_url = mediaplayer_div_attributes.get("data-file")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read coding conventions on optional and mandatory fields.

# but you never know in the future
for stream in video_streams:
formats = self._extract_m3u8_formats(stream.get("url"), video_id, ext=ext, fatal=False)
if len(formats) != 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not formats.

if len(formats) != 0:
break

if re.match(r'https?://[^/]*/video/.*', url) is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/video/ in url.


if re.match(r'https?://[^/]*/video/.*', url) is not None:
is_live = False
elif re.match(r'https?://[^/]*/kenh-truyen-hinh/.*', url) is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/kenh-truyen-hinh/ in url.


# little hack to better support radio streams
if title.startswith("VOV"):
ext = "m4a"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vcodec of format should be set to 'none'.

class VTVIE(InfoExtractor):
_VALID_URL = r'https?://..\.tvnet\.gov\.vn/[^/]*/(?P<id>[0-9]+)/?.*'
_TESTS = [{
'url': 'http://us.tvnet.gov.vn/kenh-truyen-hinh/1011/vtv1',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each test should be commented on what it actually tests.

from ..utils import extract_attributes

class VTVIE(InfoExtractor):
_VALID_URL = r'https?://..\.tvnet\.gov\.vn/[^/]*/(?P<id>[0-9]+)/?.*'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No ... .* at the end is pointless.

@tmsbrg
Copy link
Contributor Author

tmsbrg commented Feb 8, 2018

Thanks for the review! Hope this commit addresses the issues you found correctly.

@dstftw
Copy link
Collaborator

dstftw commented Feb 17, 2018

  • Checked the code with flake8

Now actually do that.

from ..utils import extract_attributes

class VTVIE(InfoExtractor):
_VALID_URL = r'https?://(au|ca|cz|de|jp|kr|tw|us|vn)\.tvnet\.gov\.vn/[^/]*/(?P<id>[0-9]+)/?'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't capture groups you don't use. Use proper regex to match all country codes.

Copy link
Contributor Author

@tmsbrg tmsbrg Mar 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the only country codes that are valid at the moment. Do you think it'd be better to expand it to match every country code, or even [a-z][a-z]?


from .common import InfoExtractor

import re
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic imports should go before youtube-dl's.

mediaplayer_div = self._search_regex(r'(<div[^>]*id="mediaplayer"[^>]*>)', webpage, 'mediaplayer element')
mediaplayer_div_attributes = extract_attributes(mediaplayer_div)

thumbnail = mediaplayer_div_attributes.get("data-image")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single quotes.


thumbnail = mediaplayer_div_attributes.get("data-image")

json_url = mediaplayer_div_attributes["data-file"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

'id': video_id,
'title': title,
'thumbnail': thumbnail,
'formats': formats,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formats not sorted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants