Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix issue-109 (use unique temp files for input/output of ExtractArticle.js) #110

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

erpic
Copy link

@erpic erpic commented Sep 5, 2024

It appears that something had previously been started to address this issue but was left as is with some 'json_path' undefined.

previously:

make test

FAILED test_javascript.py::test_extract_simple_article_with_readability_js - NameError: name 'json_path' is not defined
FAILED test_javascript.py::test_extract_article_from_page_with_readability_js - NameError: name 'json_path' is not defined
FAILED test_simple_json.py::test_plain_element_with_comments - AssertionError: assert ['<div></div>'] == ['<div><p>Tex...!----></div>']
FAILED test_simple_json.py::test_content_digest_on_filled_and_empty_elements - assert ['<div></div>'] == ['<div><p dat..."></p></div>']

unfortunately tests still don't pass with proposed changes:

FAILED test_javascript.py::test_extract_simple_article_with_readability_js - AssertionError: article_json={'title': None, 'byline': None, 'date': None, 'content': '<div id="readability-page-1" class="page"><div class="page">\n                <article>\n                    <h2> Article title </h2>\n                    <p>\n                        Proin vulputate viverra dapibus...
FAILED test_javascript.py::test_extract_article_from_page_with_readability_js - AssertionError: article_json={'title': 'Trump Denies Charitable Donation He Promised If Elizabeth Warren Releases DNA Results And It’s On Video', 'byline': 'Conover Kennard', 'date': None, 'content': '<div id="readability-page-1" class="page"><div id="post-342398">\n\n\n\n<div>\n<p>Donald Trump has re...
FAILED test_simple_json.py::test_plain_element_with_comments - AssertionError: assert ['<div></div>'] == ['<div><p>Tex...!----></div>']
FAILED test_simple_json.py::test_content_digest_on_filled_and_empty_elements - assert ['<div></div>'] == ['<div><p dat..."></p></div>']
=

but this appears to be due only to extra \n in the 'content' field and the other fields are correct.

diff_meta_fields_correct diff_extra_linebreaks_present

…le.js), increase verbosity of tests (tests for above don't pass due to some some extra \n)
@jemrobinson
Copy link
Member

I think the tests are quite out-of-date (e.g. I'm not sure we have a valid Travis token any more). I'll look into moving/updating these if I have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants