-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vier] Extract more info #12539
[vier] Extract more info #12539
Conversation
Full support for this one :-) |
@@ -67,14 +73,28 @@ def _real_extract(self, url): | |||
self._sort_formats(formats) | |||
|
|||
title = self._og_search_title(webpage, default=display_id) | |||
description = self._og_search_description(webpage, default=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Purpose of this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some videos, the description extracted by self._og_search_description(..)
is something like "Watch the complete Xth episode of Y here" (but in Dutch) instead of the actual description displayed below the video. I'll add a test for this.
r'(?i)aflevering (\d+)', title, 'episode_number', default=None, | ||
fatal=False)) | ||
upload_date = unified_strdate(self._html_search_regex( | ||
r'''(?x)<div\ class="[^"]*field-name-post-date[^"]*">\s* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to escape whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you have to escape whitespace in a triple quote? I do know that without the two backslashes the regexp doesn't match.
|
||
return { | ||
'id': video_id, | ||
'display_id': display_id, | ||
'title': title, | ||
'description': description, | ||
'episode_number': episode_number, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test for such video.
I have pushed an amended commit that now includes a test case that covers the newly extracted information. |
I'll update this PR as soon as #12801 is merged. |
|
Extract the `episode_number` and `upload_date`. Also extract the real `description`.
Done. I left out commit 6f76679 because it makes the first test fail due to a missing colon:
|
* [cbsinteractive] fix extractor * [cbsinteractive] update test cases * [cbsinteractive] extract formats with `CBSIE` * [extractor/common] Fix rtmp and rtsp formats' URLs in _extract_wowza_formats * [vier] Extract more info Extract the `episode_number` and `upload_date`. Also extract the real `description`. * [vier] Relax regexes and extract more metadata (closes #12539) * [jsinterp] Add support for quoted names and indexers (closes #13123, closes #13130) * [ChangeLog] Actualize * release 2017.05.18 * [ChangeLog] Fix typo * [jsinterp] Fix typo and cleanup regexes (closes #13134) * [ChangeLog] Actualize * release 2017.05.18.1 * [mitele] Update app key regex * [hitbox] Add support for smashcast.tv (closes #13154) * [njpwworld] Fix extraction (closes #13162) * [toypics] Fix extraction * [toypics] Improve and modernize * [adobepass] Add support for Brighthouse MSO * [toggle] Relax _VALID_URL (closes #13172) * [youtube] Fix DASH manifest signature decryption (closes #8944) * [youtube] Modernize * [streamcz] Add support for subtitles * [downloader/external] Pass -loglevel to ffmpeg downloader (closes #13183) * Credit @zurfyx for atresplayer improvements (#12548) * Credit @mphe for streamango (#12643) * Credit @fredbourni for noovo (#12792) * [ChangeLog] Actualize * release 2017.05.23 * Credit @timendum for rai (#11790) and mediaset (#12964) * Credit @gritstub for vevo fix (#12879) * [cbsnews] fix extraction for 60 Minutes videos * [vimeo] Fix formats' sorting (closes #13189) * [postprocessor/ffmpeg] Fix metadata filename handling on Python 2 Fixes #13182 * [udemy] Fix extraction for outputs' format entries without URL (closes #13192) * [youku] Fix extraction (closes #13191) * [utils] Recognize more patterns in strip_jsonp() Used in Youku Show pages * [youku:show] Fix extraction * [tudou] Merge into youku extractor (fixes #12214) Also, there are no tudou playlists anymore. All playlist URLs points to youku playlists. * [bbc] Add support for authentication * Revert "[youtube] Don't use the DASH manifest from 'get_video_info' if 'use_cipher_signature' is True (#5118)" This reverts commit 87dc451. * [ChangeLog] Update after the fix for #11381 * [ChangeLog] Actualize * release 2017.05.26 * [cbsnews] Fix extraction (closes #13205) * [youku] Extract more metadata (closes #10433) * [adn] fix formats extraction * [utils] Drop an compatibility wrapper for Python < 2.6 addinfourl.getcode is added since Python 2.6a1. As youtube-dl now requires 2.6+, this is no longer necessary. See python/cpython@9b0d46d * [cbsinteractive] Relax _VALID_URL (closes #13213) * [beam:vod] Add extractor * [beam] Improve and add support for mixer.com (closes #13032) * [dvtv] Parse adaptive formats as well The old code hit an error when it attempted to parse the string "adaptive" for video height. Actually parsing the returned playlists is a good idea because it adds more output formats, including some audio-only-ones. * [dvtv] Improve and fix playlists support (closes #13063) * [medialaan] Fix videos with missing videoUrl A rough trick to get around the two different json styles medialaan seems to be using. Fix for these example videos: https://vtmkzoom.be/video?aid=45724 https://vtmkzoom.be/video?aid=45425 * [medialaan] PEP 8 (closes #12774) * [gaskrank] Fix extraction * [gaskrank] Improve (closes #12493) * [abcnews] Add support for embed URLs * [abcnews] Improve and remove duplicate test (closes #12851) * [xhamster] Extract categories (closes #11728) * [xhamster] Fix author and like/dislike count extraction * [xhamster] Simplify (closes #13216) * [youtube] Parse player_url if format URLs are encrypted or DASH MPDs are requested Fixes #13211 * [ChangeLog] Actualize * release 2017.05.29 * [README.md] Add an example for how to use .netrc on Windows That's a Python bug: http://bugs.python.org/issue28334 Most likely it will be fixed in Python 3.7: python/cpython#123 * [README.md] Mention http_dash_segments protocol * [packtpub] Fix authentication(closes #13240) * [drbonanza] Fix extraction (closes #13231) * [francetv] Relax _VALID_URL * [1tv] Lower preference for http formats (closes #13246) * [youtube] Improve chapters extraction (closes #13247) * [safari] Fix typo (closes #13252) * [YoutubeDL] Don't emit ANSI escape codes on Windows * [godtv] Remove extractor (closes #13175) * [pornhub:playlist] Fix extraction (closes #13281) * [pornhub:uservideos] Add missing raise * [bandcamp:weekly] Add extractor * [bandcamp:weekly] Improve and extract more metadata (closes #12758) * Credit @adamvoss for bandcamp:weekly (#12758) * Credit @mikf for beam:vod (#13032) * Credit @jktjkt for dvtv formats (#13063) * [ChangeLog] Actualize * release 2017.06.05 * [tvplayer] Fix extraction (closes #13291) * [rtlnl] Improve _VALID_URL (closes #13295) * [streamango] Make title optional * [streamango] Skip download for test (closes #13292) * [README.md] Clarify output template references (closes #13316) * [README.md] Improve man page formatting * [YoutubeDL] Sanitize more fields (#13313) * [liveleak] Ensure height is int (closes #13313) * [safari] Improve authentication detection (closes #13319) * [sohu] Fix numeric fields * [flickr] Ensure format id is string * [foxgay] Ensure height is int * [gfycat] Ensure filesize is int * [golem] Ensure format id is string * [jove] Ensure comment count is int * [sexu] Ensure height is int * [turbo] Ensure format id is string * [extractor/common] Return unicode string from _match_id * [extractor/generic] Ensure format id is unicode string * [msn] Fix formats extraction * [newgrounds] Improve formats and uploader extraction (closes #13346) * [newgrounds:playlist] Add extractor (closes #10611) * [utils] Improve unified_timestamp * [newgrounds] Extract more metadata (closes #13232) * [rutv] Add support for testplayer.vgtrk.com (closes #13347) * [xfileshare] Modernize and pass referrer * [xfileshare] Add support for rapidvideo (closes #13348) * [compat] Introduce compat_HTMLParseError * [utils] Handle HTMLParseError in extract_attributes (closes #13349) * [xfileshare] PEP 8 * [ChangeLog] Actualize * release 2017.06.12 * [compat] Add compat_HTMLParseError to __all__ * [corus] Add support for history.ca (closes #13359) * [corus] Add support for showcase.ca
Please follow the guide below
x
into all the boxes [ ] relevant to your pull request (like that [x])Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
Extract the
episode_number
andupload_date
. Also extract the realdescription
.The main goal of this PR is to make it easier to automate downloading whole seasons (hence the
episode_number
field).