Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support IPv6 zone identifiers #392

Closed
DemiMarie opened this issue Jun 3, 2018 · 51 comments
Closed

Support IPv6 zone identifiers #392

DemiMarie opened this issue Jun 3, 2018 · 51 comments

Comments

@DemiMarie
Copy link

DemiMarie commented Jun 3, 2018

Currently, there is no way to point a browser at fe80::1%lo.

Proposed syntax: https://[fe80::1%25lo]:80

@annevk
Copy link
Member

annevk commented Jun 4, 2018

As noted at https://url.spec.whatwg.org/#concept-ipv6 this is intentionally omitted per https://www.w3.org/Bugs/Public/show_bug.cgi?id=27234#c2.

@annevk annevk closed this as completed Jun 4, 2018
@DemiMarie
Copy link
Author

DemiMarie commented Jun 8, 2018 via email

@annevk
Copy link
Member

annevk commented Jun 8, 2018

They have to find an alternative, yes.

@DemiMarie
Copy link
Author

DemiMarie commented Jun 8, 2018 via email

@annevk
Copy link
Member

annevk commented Jun 9, 2018

What would the prompt say? (FWIW, I doubt any browser would find that acceptable, and it adds a lot of complexity as we'd have to handle the syntax everywhere, which would lead to tons of issues.)

@DemiMarie
Copy link
Author

DemiMarie commented Jun 9, 2018 via email

@telmich
Copy link

telmich commented Jun 14, 2021

With reference to https://bugzilla.mozilla.org/show_bug.cgi?id=700999 I request this bug to be re-opened. I will re-iterate the problem that is caused by the lack of link local address support in here for completeness:

I've subscribed to this bug some time ago, because Firefox is essentially preventing to configure newer network devices. I'll elaborate shortly and I ask for clarification from the Mozilla team how to handle this:

  • newer devices all support IPv6 and configure link local addresses
  • Using the multicast address ff02::1 it is possible to find out the link local address of any directly connected device

Given this background, we often have the situation of devices coming back from customers with any kind of IP configuration. The only plausible way to find the device is using above link local discovery. The IPv4 addresses are often unknown or undiscoverable.

  • Some devices can only be configured via http (i.e. ssh, telnet, tftp are off)

So we are basically in the situation that the link local address is the only reliable address that can be used to configure a whole class of devices.

I often hear the argument that with link-local addresses it would be possible to do Javascript based LAN scanning. I am not denying that, however the same is true for IPv4 - you can easily scan 192.168.x.y/24.

Even with this inconsistent security claim in mind, I ask the mozilla developers to at least include support for link local with something on the line of about:config->ipv6-allow-link-local: [false,true] to not stop all network engineers from working.

Can we reopen this bug or create a new one for realising this?

@telmich
Copy link

telmich commented Jun 14, 2021

Update: as this bug is cross-referencing other resources in each and every bug report, I tried to summarise it on https://ungleich.ch/u/blog/ipv6-link-local-support-in-browsers/

@afcady
Copy link

afcady commented Jun 16, 2021

This needs to be reopened because firefox is citing this bug as blocking its own fix of the issue.

This issue is holding up the entire IPv6 project, in case that isn't clear.

There's a lot of game playing around these issues because a lot of money is at stake when it comes to who controls domain names, and this is a way to break domain name systems that compete with ICANN & CAs.

So the developers of browsers are pretending that this is complicated when it isn't.

It's actually very simple and there's exactly one way to implement it which is very obvious. But implementing it that way is going to step on some toes of people who don't want IPv6 global connectivity and a new p2p internet foundation to happen. It goes against a lot of business models.

@afcady
Copy link

afcady commented Jun 16, 2021

By the way. It's a bug to ever strip the zone identifier before sending it from the client.

The way that HTTP works is, the client sends its own name for the server to the server itself. The server never uses this name to establish IP connectivity. The server can then send the name back to the client in links, who can use it to re-establish IP connectivity by clicking a link, or bookmarking it and opening the bookmark, for example.

Since the server NEVER uses the name to establish IP connectivity, but only sends it to the client;
and since the client MAY use the name to establish IP connectivity,
therefore: the name MUST be the name that establishes IP connectivity on the client's system.

@becarpenter
Copy link

becarpenter commented Jul 5, 2021

To try and unblock this issue, we've posted a draft update to RFC6874 and discussion is open. Details at https://mailarchive.ietf.org/arch/msg/ipv6/i5LUQN9vU9MryNWtvS_M_O7Wgjc/.
The draft itself is here.

@telmich
Copy link

telmich commented Jul 5, 2021

Much appreciated, @becarpenter !

@afcady
Copy link

afcady commented Jul 5, 2021

That doesn't fix the percent encoding.

@becarpenter
Copy link

@afcady, we are stuck with % meaning two things. The discussion on [email protected] is tending towards requiring only the %25eth0 escape encoding and dumping the suggestion to allow %eth0 heuristically.

@becarpenter
Copy link

The draft has been updated again, following discussion at the recent IETF meeting.

As always, a diff from the previous version is available.

Input on two open issues is needed from implementers!

@achristensen07
Copy link
Collaborator

Here are a few thoughts I have on this, without indicating support or opposition:

inet_pton seems to be ok with "fe80::abcd%25eth1" and "fe80::abcd%eth1" but not "fe80::abcd-eth1". NSURL seems to be ok with "http://[fe80::abcd%25eth1]/" and "http://[fe80::abcd-eth1]/" but not "http://[fe80::abcd%eth1]/". "fe80::abcd%25eth1" seems to be the most parsable of those examples in my sample of 2 IPv6 host parsers. I'm concerned that if we decide to use "%25" as the delimiter to indicate the beginning of a zone id, some software will interpret "25eth1" to be the zone id and some will interpret "eth1" to be the zone id. All browsers currently fail to parse all of those examples. It is clear that software will need to change if we decide to support this. If compatibility weren't a concern, I think it would be nicest to introduce a new delimiter such as '-'.

I'm curious how someone would get a zone id to use. Some systems might use "eth1" as a meaningful zone id, while other systems might use "en1" or "1". If this is the case, it makes me question the uniformity of these URIs.

Your document says "However, the IPv6 Scoped Address Architecture specification gives no precise definition of the character set allowed in <zone_id>. There are no rules or de facto standards for this." In order to be a part of the URL specification, we need a precise parsing definition for all possible input, such as "http://[fe80::abcd%25💩]/" which I imagine would need to percent-encode the emoji, or "http://[fe80::abcd%25%invalid]/" which seems to fail to parse.

@karwa
Copy link
Contributor

karwa commented Aug 19, 2021

Windows UNC paths apparently use ‘s’ to delimit a zone ID.

No idea why they chose ‘s’ but it doesn’t have the same problems that ‘%’ does in a URL context, so maybe that’s also worth considering.

@telmich
Copy link

telmich commented Aug 19, 2021

@karwa Isn't the "s" only used in the context of a domain name? The referenced example on wikipedia says
fe80--1ff-fe23-4567-890as3.ipv6-literal.net, which is using Microsoft's ipv6-literal.net domain.

@achristensen07
Copy link
Collaborator

ping6 interprets "fe80::abcd%25en0" to have a zone id of 25en0, so the current proposal isn't compatible with that

@DemiMarie
Copy link
Author

ping6 interprets "fe80::abcd%25en0" to have a zone id of 25en0, so the current proposal isn't compatible with that

That becomes fe80::abcd%en0 after URL decoding.

@becarpenter
Copy link

becarpenter commented Aug 26, 2021 via email

@becarpenter
Copy link

@becarpenter
Copy link

The IETF 6MAN WG has just formally adopted our document draft-ietf-6man-rfc6874bis-00. All we need are developers who understand all the places where URLs are parsed (there are probably several) and where the actual socket calls are made. I'm glad to help if developers contact me.

@becarpenter
Copy link

BTW, this bug was closed in June 2018 based on arguments at https://www.w3.org/Bugs/Public/show_bug.cgi?id=27234#c2. Those arguments were against various features of RFC6874. The new draft is quite different and (if published as an RFC) will remove all those annoying features.

@becarpenter
Copy link

New version of the draft published: https://www.ietf.org/archive/id/draft-ietf-6man-rfc6874bis-01.html.
Among other things it adds an interesting Microsoft Windows 10 use case.

@becarpenter
Copy link

Just noting that https://www.ietf.org/archive/id/draft-ietf-6man-rfc6874bis-02.html came out a while ago and is now in review by the appropriate IETF Area Director.

w.r.t. some of the comments above, getting rid of the percent-encoding seems to make the parsing issues quite a bit less thorny, but we really need implementers to look at that question.

@becarpenter
Copy link

The relevant URI syntax update is now in IETF Last Call, i.e. the last opportunity for public comments: https://mailarchive.ietf.org/arch/msg/ietf-announce/BqBF9qvZ8qZR4ZPlawPvQSe0WbU/

@becarpenter
Copy link

Worth mentioning that the draft has been updated following Last Call comments: https://www.ietf.org/archive/id/draft-ietf-6man-rfc6874bis-03.html

@becarpenter
Copy link

@karwa
Copy link
Contributor

karwa commented Oct 31, 2022

RFC-5952 mentions some of the problems that arise due to the flexibility of textual IPv6 addresses and the benefits of having a single, canonical textual representation. Given that zone IDs are opaque ASCII strings, I guess that no normalization can be applied to them, correct?

In other words, [::1234%EN0] and [::1234%en0] must considered distinct addresses, and URLs containing those addresses must also be considered distinct. This also means that the hostname in general would become case-sensitive, contrary to RFC-3986:

The host subcomponent of authority is identified by an IP literal
encapsulated within square brackets, an IPv4 address in dotted-
decimal form, or a registered name. The host subcomponent is case-
insensitive.

https://www.rfc-editor.org/rfc/rfc3986.html#section-3.2.2

When a URI uses components of the generic syntax, the component
syntax equivalence rules always apply; namely, that the scheme and
host are case-insensitive and therefore should be normalized to
lowercase
. For example, the URI HTTP://www.EXAMPLE.com/ is
equivalent to http://www.example.com/.

https://www.rfc-editor.org/rfc/rfc3986.html#section-6.2.2.1

@becarpenter
Copy link

Good catch. RFC4007 says nothing about case (in fact, it says nothing useful about the Zone ID string at all). Running code (a.k.a. ping on Linux) tells me that implementations are case-sensitive, which is of course the implication of saying nothing.
That sentence "The host subcomponent is case-insensitive." is tricky. It's appropriate when applied to a plain IPv6 address, since the hexadecimal characters are indeed case-insensitive anyway. It's inappropriate when applied to a Zone ID string. I think we'll have to live with it, though, and restrict the format to lower case Zone IDs. I'll take this question to the IETF WG list.

@DemiMarie
Copy link
Author

Good catch. RFC4007 says nothing about case (in fact, it says nothing useful about the Zone ID string at all). Running code (a.k.a. ping on Linux) tells me that implementations are case-sensitive, which is of course the implication of saying nothing. That sentence "The host subcomponent is case-insensitive." is tricky. It's appropriate when applied to a plain IPv6 address, since the hexadecimal characters are indeed case-insensitive anyway. It's inappropriate when applied to a Zone ID string. I think we'll have to live with it, though, and restrict the format to lower case Zone IDs. I'll take this question to the IETF WG list.

I recommend having the zone ID be case-sensitive, to reflect what current implementations do.

@becarpenter
Copy link

I don't know how to do that without causing a major problem for the URI parsers in every browser.

@DemiMarie
Copy link
Author

I don't know how to do that without causing a major problem for the URI parsers in every browser.

Why would this cause such a problem?

@karwa
Copy link
Contributor

karwa commented Nov 1, 2022

I don't think the problem is with browsers specifically.

The issue is that the new RFC is defined in terms of RFC-3986 and updates it, but 3986 makes quite a broad promise of hosts being case-insensitive. It does not even restrict this to certain kinds of hosts - it just says "the host subcomponent". So it's extremely broad, and there may be applications which rely on that.

For example, imagine I have some sort of application-level cache - I can treat requests to SOMEHOST as being equivalent to a request to somehost. Given the language in 3986, I don't even need to figure out what kind of host is being referred to (whether it's an IP address or registered name) - I can just lowercase everything, and that's fine. Maybe I won't catch all requests to the same IP address, but I won't produce false positives, where I say 2 different hosts are equivalent.

This new RFC would make an incompatible change to 3986, by taking away that promise and saying that some hosts may actually be case-sensitive, and that if you just lowercase them as was previously allowed, you might be meaningfully altering which host is being referred to.

The WHATWG URL standard would actually be more accommodating of case-sensitive elements within IP literals than RFC-3986, because we don't make the same broad guarantee. In the WHATWG model, the parser takes a string and creates a URL record from it, and that URL record can contain a host (which is also a record, containing the parsed IP address value). The URL serialiser produces the canonical textual form of that URL record, so nobody needs to do things like manually lowercasing hostnames, and nowhere in the standard does it recommend that anybody does so themselves; the output is already normalised to the extent the standard defines things to be equivalent:

Parsing any of:

http://[::ABCD]/
http://[::abcd]/
http://[::0:0:0:ABCD]/
http://[::0.0.171.205]/

All produce the same result:

http://[::abcd]/

@becarpenter
Copy link

Thanks @karwa. I agree that any URI parser or decoder would have this problem, not just those in browsers. As soon as they have separated out the host part of the URI, any programmer will normalise the whole thing to lower case before analysing whether it's example.com, 1.2.3.4, [::abcd] or [::abcd%upper].
@DemiMarie is correct, I guess, that theoretically every parser could be hacked around to defer the normalisation but that is a big ask, whereas (from my experience with patching wget) the change as defined in the draft is quite straightforward.

@becarpenter
Copy link

Just to confirm, for the case of wget (patched to support RFC6874bis), if I do

wget http://[FE80::3e2a:fdff:fea4:dde7%WLP2S0]

it responds

Connecting to fe80::3e2a:fdff:fea4:dde7%wlp2s0|fe80::3e2a:fdff:fea4:dde7|:80... connected.

In other words, wget normalises the host component to lower case, as expected.

(The patch to wget is at https://github.com/becarpenter/wget6)

@becarpenter
Copy link

New version of the draft today : https://www.ietf.org/archive/id/draft-ietf-6man-rfc6874bis-05.html
There's one change, adding a note that zone IDs with upper case letters won't work.
(That's an issue we can't fix, due to a shortfall in RFC4007.)

@DemiMarie
Copy link
Author

New version of the draft today : https://www.ietf.org/archive/id/draft-ietf-6man-rfc6874bis-05.html There's one change, adding a note that zone IDs with upper case letters won't work. (That's an issue we can't fix, due to a shortfall in RFC4007.)

Would it be possible to make zone IDs with upper case letters an error? That way if in the future it is possible to support them, it can be added backwards-compatibly.

@martinetd
Copy link

Would it be possible to make zone IDs with upper case letters an error? That way if in the future it is possible to support them, it can be added backwards-compatibly.

Please don't. On linux it's just case sensitive, upper cases are not invalid per se.
It's perfectly possible (although I wouldn't recommend it) to create an interface wan0 and another interface Wan0 next to each other and you can ping fe80::1%wan0 and fe80::1%Wan0 to go through either interface appropriately. (You can also test with dummy interfaces and ip link add Test type dummy for example:

# ip link add Test type dummy
# ip link add test type dummy
$ getent ahostsv6 fe80::1%test
fe80::1%11   STREAM fe80::1%test
fe80::1%11   DGRAM  
fe80::1%11   RAW    
$ getent ahostsv6 fe80::1%Test
fe80::1%10   STREAM fe80::1%Test
fe80::1%10   DGRAM  
fe80::1%10   RAW    

)

I don't think anyone is daring enough to do that in practice so I don't think having parsers assume one is equal to the other would be a problem, but case should be preserved for the actual host resolution/connect/sendto/whatever call

@DemiMarie
Copy link
Author

I don't think anyone is daring enough to do that in practice so I don't think having parsers assume one is equal to the other would be a problem, but case should be preserved for the actual host resolution/connect/sendto/whatever call

I still think parsers should be required to be case-sensitive here. Case-preserving is the bare minimum.

@becarpenter
Copy link

I'd be delighted if I thought that was reasonably possible, but having looked at some of the Firefox code, I really, really doubt it.

@DemiMarie
Copy link
Author

I'd be delighted if I thought that was reasonably possible, but having looked at some of the Firefox code, I really, really doubt it.

What would be required for it to work? Major refactoring?

@becarpenter
Copy link

You'd need to hop over to https://bugzilla.mozilla.org/show_bug.cgi?id=700999 and ask there.

@valenting
Copy link
Collaborator

Hi all,

In order to make some progress on this topic I would like to propose a compromise change to the URL standard that punts on all the hard questions about zone ID.

As indicated in Martin's feedback most browsers still have a problem with the zoneID and wouldn't implement it. However, URLs with a zoneID still exist, and the fact that URL parsers consider them invalid isn't great.
The use case that I encountered was that my printer settings was pointing me towards a URL containing a zoneID - obviously that failed to parse, so I had to manually remove the zoneID from the URL to access it.
The middle ground I'm thinking of is the parser would remove (and ignore) the zoneID while parsing the URL so at least it works on machines that have a default zone ID.

The changes to the URL parsing algorithm would be minimal:
In https://url.spec.whatwg.org/#concept-ipv6-parser Step 6 would become: While [c](https://url.spec.whatwg.org/#c) is not the [EOF code point](https://url.spec.whatwg.org/#eof-code-point) and [c](https://url.spec.whatwg.org/#c) is not U+0025 (%):
and 6.7 would become Otherwise, if [c](https://url.spec.whatwg.org/#c) is not the [EOF code point](https://url.spec.whatwg.org/#eof-code-point) and [c](https://url.spec.whatwg.org/#c) is not U+0025 (%), [validation error](https://url.spec.whatwg.org/#validation-error), return failure.
This would have the effect of the zoneID being ignored, so at least we are able to parse such URLs.

@annevk if this is acceptable I will send a PR. Hopefully this is non-controversial enough to be acceptable to Blink and WebKit too.

@annevk
Copy link
Member

annevk commented Jan 19, 2023

I think that warrants a new issue. It's not clear to me that is a good idea because the authority question remained unresolved. If it should impact authority and we end up treating multiple distinct authorities as one, that would not be good. And while there are plenty of ways to make a URL appear like another one, I'm not sure we want to add to that problem.

Also in other domains ignoring all input after a certain character has led to injection attacks. How would we avoid those here?

It's worth discussing, but I wouldn't classify it as non-controversial.

@karwa
Copy link
Contributor

karwa commented Jan 19, 2023

IMO, we should support Zone IDs.

Fundamentally, no host has a universally-guaranteed meaning. The URL standard does not define what hosts actually mean, and generally the assumption is that they will be passed to a system resolver.

How that resolver works is undefined, and in general, different systems will do different things, and allow for the user to customise different parts of the process. For instance, the hosts file can be used to provide a custom mapping, and after that the system may search the local network or other sources before falling back to DNS (The Windows GetAddrInfoEx function, for example, claims to support not only DNS, but also NetBIOS, WINS, Bluetooth, and various peer-to-peer protocols). But generally, after consulting local sources, the resolver will query DNS.

DNS itself can be heavily customised - both by the user, and by the backend. Users can provide custom DNS servers (e.g. Google public DNS), and ISPs can direct queries to particular servers using dedicated physical infrastructure, on-site caches, or to alternate websites (let's imagine the state has a problem with website X and wants to send users to a more ideologically-appropriate site). Ultimately, we have no way to detect any of that. We have no idea what the hostname example.com actually means, and whether the result obtained by a specific client resolution process accurately reflects what the author of the URL intended. And in modern networks, where devices are mobile, generally suspend rather than shut down, and may be negotiating between various WiFi and cellular networks, network configurations can easily fluctuate within the lifetime of a single process, meaning the identity of a resolved name is constantly in flux.

IP addresses are similarly fuzzy. Two machines with different network configurations may have different understandings of what a given address should mean. We give an IP address to the system, and it connects to some machine, and that's about as much as we can say about it. It doesn't come with nearly as much ambiguity as domains have, but it's all still client-specific.

So when I see arguments such as:

Inclusion of purely local information in the universal identity of a resource
runs directly counter to the point of having a URI.

And

the Web security model depends on having a clear definition
for the origin of resources. The definition of Origin depends on the
representation of the hostname and it relies heavily both on uniqueness
(something a zone ID potentially contributes toward) and consistency across
contexts (which a zone ID works directly against)

I think it overstates how much we can actually rely on existing hostnames to be unique, and it fails to explain how 10.0.0.1 and [::abcd] constitute a "universal identity" which is "consistent across contexts" but [::abcd%eth0] somehow is neither.

But more to the point, I think it misrepresents what URLs are. URLs are universal identifiers, but that does NOT mean that they contain the universal identity of a resource. It just means that they subsume all other kinds of identifiers. It is perfectly fine to use URLs to identify data in a local application - e.g. something like my-recipe-app:/chicken-curry/ingredients#4 is not a misuse of URLs, even if it fails to resolve, or resolves to something else, on another machine.

URLs are, IMO, simply a flexible syntax for expressing the different kinds of identifiers that exist, so that any application can see the URL http://[::abcd%eth0]/config/foo, understand what the different parts are, and infer how to connect to that resource, using the system interfaces available to do so (accepting that they may be configurable).

And I think it should be possible to express these kinds of locations under the http scheme. They are popular enough that many shipped products use them, and operating systems have included the required interfaces to resolve these names for over a decade. They seem to be an intrinsic part of IPv6 addresses, so IMO the only reasonable course is to accept them as part of our support for IPv6 addresses.

Of course, no client is obligated to support a particular kind of host. I don't see any technical for doing so, but browsers should be allowed to decline requests to such URLs if they wish. I hope they would at least make it a configurable option rather than an outright ban.

@annevk
Copy link
Member

annevk commented Jan 19, 2023

The URL Standard and standards that build on it do end up using and exposing the host in quite a few ways. So perhaps meaning is not strictly-speaking defined, but there is a lot of behavior build on top that is outside the realm of DNS.

We cannot just change the syntax without addressing that. I told the RFC authors repeatedly that syntax isn't really the problem here. It's the end-to-end integration.

And even if someone solved that, there's also the problem of getting implementer interest, which is a requirement per our Working Mode. And thus far I've largely seen opposition on that front.

@becarpenter
Copy link

@valenting : "The use case that I encountered was that my printer settings was pointing me towards a URL containing a zoneID - obviously that failed to parse, so I had to manually remove the zoneID from the URL to access it." That only works if you're lucky enough to have your printer on the default link (aka zone). As home networks get more complex that isn't guaranteed, although I agree that it's a useful fix for the common case.
@annevk: "It's the end-to-end integration." We (authors) understood that point and would like to know in which way the latest draft doesn't answer the concern. Of course we are not going to specify the algorithms, but we can of course add more about the expected behaviour. In IETF terminology: send text.

@becarpenter
Copy link

Should have added that the default zone is only a SHOULD in the underlying RFC4007, and as far as I know Linux doesn't support a default zone, although Windows does.

kevinoid added a commit to kevinoid/appveyor-status that referenced this issue Sep 12, 2023
The built-in URL class dropped support for zone identifiers in IPv6
address literals in Node.js 20.  Calling the URL constructor with a URL
containing a zone identifier causes ERR_INVALID_URL to be thrown.  This
is likely a result of switching to version 2.0 of the Ada URL parser in
<nodejs/node#47339>.  The behavior aligns with
how [IPv6 address is defined in the WHATWG URL
Standard](https://url.spec.whatwg.org/#concept-ipv6), which notes that

> Support for <zone_id> is intentionally omitted.

As explained in the issue tracker:

https://www.w3.org/Bugs/Public/show_bug.cgi?id=27234#c2
whatwg/url#392

Skip this test, since this URL format is not supported.  If it's
necessary to support SCP-like git URLs with zone identifiers, we'll need
to roll our own support.

Signed-off-by: Kevin Locke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

9 participants