JSON-LD with "@context": "https://schema.org" not parsed by _json_ld
function
#16006
Closed
5 of 9 tasks
_json_ld
function
#16006
Make sure you are using the latest version: run
youtube-dl --version
and ensure your version is 2018.03.26.1. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.Before submitting an issue make sure you have:
What is the purpose of your issue?
The Problem
The way the
@context
property of a JSON-LD string is checked it too strict. Currently@context
has to be exactly the stringhttp://schema.org
in order to be parsed by the_json_ld
function incommon.py
. However, some websites havehttp
changed tohttps
. I also saw websites adding a/
to the end like so:http://schema.org/
.This leads to the problem, that JSON-LD strings of those websites do not get parsed, which cause video extraction errors, since
_json_ld
then returns an empty dictionary. (See below for an example.)Suggested Fix
In order to solve the problem, I suggest to make the check for the
@context
property more resiliant. Instead of checking against a static string...https://github.com/rg3/youtube-dl/blob/5d60b9971784289acd4325a8ed7b5afd7bea05ca/youtube_dl/extractor/common.py#L1028
we should use a regex that allows the above mentioned modification of the string:
This small change already fixes the currently broken
gamestar.py
extractor. Additionally, no existing extractor can be affected in a negative way, e.g, no extractor will get broken through this change.Please let me know, if I should open a pull request myself.
Example
An example of an extractor that fails because of not parsed JSON-LD is
gamestar.py
. Please note, that 'KeyError' occurs only because the info_dict is empty, since theself._json_ld
function didn't parse anything. Here, there is ahttps
instead ofhttp
used in the@context
property.The text was updated successfully, but these errors were encountered: