Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check anchor links #91

Closed
bahmutov opened this issue Feb 12, 2020 · 13 comments
Closed

Check anchor links #91

bahmutov opened this issue Feb 12, 2020 · 13 comments
Labels
enhancement New feature request or implementation

Comments

@bahmutov
Copy link

For example when linking to https://github.com/#does-not-exist would be nice to parse returned HTML to detect that the anchor does-not-exist does not exist

@joelverhagen
Copy link

I took an an initial stab at it. I got it working locally with just anchors in file system Markdown files. My branches are here:

tcort/link-check@master...joelverhagen:check-anchors
tcort/markdown-link-extractor@master...joelverhagen:check-anchors
master...joelverhagen:check-anchors

It met my own needs for a one-time pass of some Markdown files interlinking to each other.

Some issues with the implementation that I feel prevent it from being PR-worth:

  1. No implementation for links to HTML documents (would likely require an HTML parser)
  2. Verification logic for anchors leaks out of the link-check module since it does not have a Markdown parser dependency. link-check stuffs a readable stream into the result object and expects markdown-link-check to do the verification.

My output looks like this:

FILE: D:\src\git\github.com\NuGet\NuGet.Services.Metadata\src\NuGet.Services.SearchService\README.md
Cannot check for anchors in HTTP/HTTPS links: https://docs.microsoft.com/en-us/nuget/consume-packages/finding-and-choosing-packages#search-syntax
Cannot check for anchors in HTTP/HTTPS links: https://docs.microsoft.com/en-us/nuget/api/package-base-address-resource#enumerate-package-versions
Cannot check for anchors in HTTP/HTTPS links: https://github.com/NuGet/Home/wiki/SemVer2-support-for-nuget.org-%28server-side%29#identifying-semver-v200-packages
[✓] https://azure.microsoft.com/en-us/services/search/
[✓] https://github.com/NuGet/NuGetGallery
[✓] #query---v3-search-endpoint
[✓] https://docs.microsoft.com/en-us/nuget/api/search-query-service-resource
[✓] #autocomplete---v3-autocomplete-endpoint
[✓] https://docs.microsoft.com/en-us/nuget/api/search-autocomplete-service-resource
[✓] #searchquery---internal-v2-search-endpoint
[✓] #searchdiag---diagnostic-information-for-monitoring
[✓] #---health-endpoint-for-load-balancers
[✓] https://docs.microsoft.com/en-us/nuget/consume-packages/finding-and-choosing-packages#search-syntax
[✓] ../../docs/Azure-Search-indexes.md
[✓] ../../docs/Search-auxiliary-files.md
[✓] ../../docs/Search-auxiliary-files.md#download-count-data
[✓] ../../docs/Search-auxiliary-files.md#verified-packages-data
[✓] ../../docs/Search-auxiliary-files.md#popularity-transfer-data
[✓] Controllers/SearchController.cs
[✓] https://docs.microsoft.com/en-us/nuget/api/service-index
[✓] ../../docs/Azure-Search-indexes.md#search-index
[✖] #user-content-query---v3-search-endpoint
[✓] https://docs.microsoft.com/en-us/nuget/api/package-base-address-resource#enumerate-package-versions
[✓] ../../docs/Azure-Search-indexes.md#hijack-index
[✓] https://github.com/NuGet/NuGetGallery/issues/7366
[✓] https://github.com/NuGet/Home/wiki/SemVer2-support-for-nuget.org-%28server-side%29#identifying-semver-v200-packages
[✓] https://docs.microsoft.com/en-us/nuget/concepts/package-versioning
[✓] https://github.com/NuGet/Home/wiki/SemVer2-support-for-nuget.org-%28server-side%29
[✓] ../NuGet.Jobs.Db2AzureSearch/README.md

26 links checked.

ERROR: 1 dead links found!
[✖] #user-content-query---v3-search-endpoint → Status: 404

@connorjclark
Copy link

@joelverhagen FWIW I think just the inter-link markdown verification is by far the most useful aspect of an anchor link lint. I'd suggest not handling URLs w/ fragments to real html documents any differently in an initial PR (no Cannot check for anchors in HTTP/HTTPS links message), and just focus on the markdown bit.

@belorenz
Copy link

@joelverhagen FWIW I think just the inter-link markdown verification is by far the most useful aspect of an anchor link lint. I'd suggest not handling URLs w/ fragments to real html documents any differently in an initial PR (no Cannot check for anchors in HTTP/HTTPS links message), and just focus on the markdown bit.

👍 This is exactly what we are using markdown-link-check for, most of the time. Having this feature would be great, since the anchors to local documents break quite often.

@fpetrogalli
Copy link

Hi - what did this issue end up doing in term of implementing the checks for the markdown internal links? It doesn't seem to be implemented in the current version of markdown-link-check?

Kind regards,

Francesco

@jyavorska
Copy link

I'd also love to see this make it into the library.

@fpetrogalli
Copy link

Yeah, at the moment the only workaround I could figure out for this is a dirty hack (ARM-software/acle#104):

 pandoc acle.md --verbose --fail-if-warnings -o acle.pdf 2>&1 | grep -E 'pdfTeX warning \(dest\): name{[^}]+}' | sed -E  's/.*name\{([^}]+)\}.*/\1/' | sort | uniq

@BenRoe
Copy link

BenRoe commented Dec 4, 2021

+1 for this feature

@nmattia
Copy link

nmattia commented Mar 19, 2022

It would be great to have this in. @tcort how can we help to get this merged? Are there any implementation or design issues?

tcort added a commit that referenced this issue Mar 19, 2022
@tcort
Copy link
Owner

tcort commented Mar 19, 2022

Implemented in 3.10.0

@tcort tcort closed this as completed Mar 19, 2022
@tcort tcort unpinned this issue Mar 19, 2022
@nmattia
Copy link

nmattia commented Mar 20, 2022

Thanks a lot @tcort for the quick implementation!

There's one issue still, linking to anchors in other files doesn't seem to work:

test.md:

# This is a test

[one](./foo.md) this works
[two](./foo.md#some-anchor) this doesn't

foo.md:

# Foo

no other anchors.
~/markdown-link-check$ ./markdown-link-check ./test.md

FILE: ./test.md
  [✓] ./foo.md
  [✓] ./foo.md#some-anchor

  2 links checked.

Here, the second test should have failed. Is that correct or did I bork my update?

@sschuberth
Copy link

Hmm, upgrading from v3.9.3 to v3.10.0 actually seems to have broken anchor checks for us, see:

  ERROR: 8 dead links found!
  [✖] #analyzer → Status: 404
  [✖] #downloader → Status: 404
  [✖] #scanner → Status: 404
  [✖] #advisor → Status: 404
  [✖] #evaluator → Status: 404
  [✖] #reporter → Status: 404
  [✖] #analyzer-for-spdx-documents → Status: 404
  [✖] https://ctrlx-automation.com/ → Status: 0

@per1234
Copy link

per1234 commented Apr 19, 2022

@sschuberth the bug was reported here: #193

Although very valuable in its own right, the pull request does not serve as a very effective way to track the issue. We would normally expect to find the items for outstanding bugs in the open issues and PRs, but that one has been merged. However, it does not fix the bug since it only provided the unit tests to demonstrate the bug,

So it would be best to create a dedicated issue for this bug since it is likely to cause quite some impact on the users. It did for my projects, but I was able to find the PR at that time because it had not yet been merged.

@sschuberth
Copy link

So it would be best to create a dedicated issue for this bug

I've created one at #195.

Orzelius added a commit to garden-io/garden that referenced this issue Jul 7, 2022
* also check relative links and achors using markdown-link-check
  (we didn't use it before because of the issue below,
  see: tcort/markdown-link-check#91)
* actually check for achor and relative link errors
  (previously the errors were ignored in the script)
* add else statement that logs if nothing was checked due to no changes
Orzelius added a commit to garden-io/garden that referenced this issue Jul 7, 2022
* chore: remove unused check-docs.ts

* chore: update check-docs script

* also check relative links and achors using markdown-link-check
  (we didn't use it before because of the issue below,
  see: tcort/markdown-link-check#91)
* actually check for achor and relative link errors
  (previously the errors were ignored in the script)
* add else statement that logs if nothing was checked due to no changes

* chore: remove remark

now we check all the links with markdown-link-check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature request or implementation
Projects
None yet
Development

No branches or pull requests