-
-
Notifications
You must be signed in to change notification settings - Fork 736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICU-22843 Enable constructing UnicodeString from literal in fixed time #3106
Conversation
Please take a look at this. I have intentionally not done anything about the documentation yet, thinking that it'll be a better use of time to first make sure that we all agree that this change should be made at all, and if so, that the implementation is sound. As can be seen directly in this PR no existing call sites should need to be changed, the updated new templated constructor will accept all existing callers with legacy behaviour kept intact, so I think that the change could be deemed acceptable for the ICU API stability promises. But it must be mentioned that the old constructors really do get removed so in case anyone anywhere has code that does something like taking the address of one of the old constructors then that code will cease to work after this change. As the change is gated on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, I like this a lot. While it adds yet more template magic, it also magically makes existing call sites that use literals faster.
One question/concern below.
Once we have an answer for the explicit-constructor question, this wants to be proposed as an API change via email to icu-design.
(I am not too worried about people getting the address of one of the old constructors.)
@eggrobin what do you think?
@richgillam heads-up...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning to do the same treatment for the other functions that have overloads for both raw-pointer and template-S?
I haven't made any such plans for I haven't really looked at those functions at all (so I don't know if there's anything that would need to be done there). |
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @roubert, the ICU 76 feature freeze is in less than 2 weeks. It would be great if you could finish your work here. Please take a look at the other new functions that take some ConvertibleToU16StringView, and have raw-pointer overloads, to see if they should get the same treatment.
When you have a complete set of changes, remember to write an API change proposal; you could reply to my proposal with an addendum, like I just did.
PS: Please don't squash until after approval, except when you need to rebase. |
@eggrobin, @richgillam as you were the reviewers of PR #3076, I'd like to know your opinion on these proposed changes here. |
I have opined. Apologies for the delayed review, I was on vacation last week. |
Reminder...
|
I just think that it's a good idea to 1) make sure that we all agree that this should be done at all and 2) agree on how exactly to do it, before spending any time on doing any further changes. |
Suggestion:
Note: I looked at the other overloads that I added, and AFAICT none of them has a sibling overload that just takes a pointer, so this might be it for now. |
If there's any reason for why any of this wouldn't work for Apple, now would be a very good time to learn about that. |
When passing a string literal to any of the legacy constructors that take just a plain pointer to a UTF-16 string it becomes necessary to iterate through the string to find its length, even though this length was known to the compiler (which just has no way of passing it on to the constructor). But when calling the new templated string view constructor instead it becomes possible for the compiler to use the known length of a string literal to directly create a string view of the correct size and pass this on to the constructor. By replacing the legacy constructors with the new constructor this is made the default behaviour.
I've now searched through the entire file without finding any such functions. |
OK, it seems like we've reached some kind of silent consensus here, so I've now updated the documentation comments and made this PR ready to merge (if the proposed API change gets approved by the TC). |
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API change was approved in the ICU-TC meeting yesterday.
As was pointed out in PR unicode-org#3106, the standard doesn't mandate the existence of std::char_traits<uint16_t> and libc++ did actually deprecate it in release 18 and delete it in release 19. As ICU4C uses this for backward compatibility for old code that still depends on setting -DUCHAR_TYPE=uint16_t it'd be nice to keep using it as long as possible but once it's gone from the standard library there isn't really much useful that ICU4C could do about that so conditionally compiling the code when the standard library version is old enough should suffice for keeping backward compatibility for yet some time. llvm/llvm-project@cce062d
As was pointed out in PR #3106, the standard doesn't mandate the existence of std::char_traits<uint16_t> and libc++ did actually deprecate it in release 18 and delete it in release 19. As ICU4C uses this for backward compatibility for old code that still depends on setting -DUCHAR_TYPE=uint16_t it'd be nice to keep using it as long as possible but once it's gone from the standard library there isn't really much useful that ICU4C could do about that so conditionally compiling the code when the standard library version is old enough should suffice for keeping backward compatibility for yet some time. llvm/llvm-project@cce062d
See my comment above. Apple does still define |
...And now I get to the bottom and am reminded that this was already merged. I lost track-- did we decide to leave the |
It's in, but #3144 makes it conditional, removing it starting with libc++ 18. Making a string_view over uint16_t generates warnings with libc++ 18 and no longer works at all with libc++ 19. It should continue to work if you use a standard library that still supports |
Okay. Sounds like it's time for me to figure out how to get everybody to change to using |
When passing a string literal to any of the legacy constructors that take just a plain pointer to a UTF-16 string it becomes necessary to iterate through the string to find its length, even though this length was known to the compiler (which just has no way of passing it on to the constructor).
But when calling the new templated string view constructor instead it becomes possible for the compiler to use the known length of a string literal to directly create a string view of the correct size and pass this on to the constructor.
By replacing the legacy constructors with the new constructor this is made the default behaviour.
Because non-templated functions take priority it's necessary to replace the legacy constructors or else the new constructor would never be called for any data type that can implicitly decay into a pointer (and this includes string literals). For the time being, this is gated on
U_HIDE_DRAFT_API
which will select either the old or the new constructors and even though the old constructors technically get deleted the new constructor is made possible to call in identical ways so that no changes to any call sites are necessary.There is however a snag there and that's an old promise that
nullptr
is valid input and identical to the default constructor (ie. the empty string) but this is explicitly not valid input for string views, so this requires a backward compatibility shim to catch anynullptr
before any attempt is made to use it to create a string view (which would crash most implementations of the standard library).Checklist