Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normative: Add Intl.Segmenter #553

Merged
merged 9 commits into from
Jan 24, 2022
Merged

Conversation

gibson042
Copy link
Contributor

@FrankYFTang
Copy link
Contributor

Should the "2 Conformance" https://tc39.es/ecma402/#conformance also be changed for Intl.Segmenter?

@sffc sffc mentioned this pull request May 8, 2021
@sffc sffc linked an issue May 8, 2021 that may be closed by this pull request
gibson042 added a commit to gibson042/proposal-intl-segmenter that referenced this pull request Oct 25, 2021
gibson042 added a commit to tc39/proposal-intl-segmenter that referenced this pull request Oct 25, 2021
@FrankYFTang
Copy link
Contributor

I suggest you also add

    <li>
      <a href="https://www.unicode.org/reports/tr29/">Unicode Standard Annex 29- Unicode Text Segmentation</a>
    </li>  

to the end of normative-references.html
And also mayb consider listing
Common Locale Data Repository (available at http://cldr.unicode.org)
and
International Components for Unicode https://unicode-org.github.io/icu-docs/
Since that two are also mentioned in the proposal.

@gibson042
Copy link
Contributor Author

Thank you @FrankYFTang. I have included UAX 29, but not CLDR or ICU because references to those are intentionally non-normative.

@gibson042
Copy link
Contributor Author

@ryzokuken Could I get a review from you as well?

@ryzokuken
Copy link
Member

@gibson042 absolutely! Sorry for the delay, but I thought this would be an even larger diff. On it.

Copy link
Member

@ryzokuken ryzokuken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, with a few comments inline. Two main comments:

  1. Would you like to switch to the new structured arguments syntax?
  2. The AOs are interleaved all over the spec and I don't see a compelling reason for that. Would you mind putting the AO section at the top with all the AOs in it?

Comment on lines +20 to +21
1. Let _internalSlotsList_ be &laquo; [[InitializedSegmenter]], [[Locale]], [[SegmenterGranularity]] &raquo;.
1. Let _segmenter_ be ? OrdinaryCreateFromConstructor(NewTarget, *"%Segmenter.prototype%"*, _internalSlotsList_).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this is a single-use variable and not inline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're referring internalSlotsList, it's for the value of naming what is otherwise a potentially confusing argument (especially when the list of slots exceeds three or four items) using a convention that is already fairly common throughout ECMA-262 and ECMA-402.

I would actually support a followup making it universal in ECMA-402, ideally using a table-based approach like OrdinaryFunctionCreate and BoundFunctionCreate and ModuleNamespaceCreate:

Suggested change
1. Let _internalSlotsList_ be &laquo; [[InitializedSegmenter]], [[Locale]], [[SegmenterGranularity]] &raquo;.
1. Let _segmenter_ be ? OrdinaryCreateFromConstructor(NewTarget, *"%Segmenter.prototype%"*, _internalSlotsList_).
1. Let _internalSlotsList_ be the internal slots listed in <emu-xref href="#table-segment-iterator-instance-slots"></emu-xref>.
1. Let _segmenter_ be ? OrdinaryCreateFromConstructor(NewTarget, *"%Segmenter.prototype%"*, _internalSlotsList_).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Captured at #649

</emu-clause>
</emu-clause>

<emu-clause id="sec-intl-segmenter-abstracts">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ECMA-402, the abstract operations for any object are at the top, before the constructor. Would you like to reorder these to be more consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with CreateSegmentsObject, I think positioning operations below their use makes for better (specifically more linear) consumption of the spec. I'm willing to change this, but would rather update other sections to follow its pattern. Thoughts?

</p>

<p>
The value of the [[RelevantExtensionKeys]] internal slot is &laquo; &raquo;.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this would change at some point? If not, why not just hardcode the empty list in the constructor and get rid of this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed that other constructors follow this pattern too, so you're definitely in the right here. Just generally curious if this is better than just hardcoding an empty list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better than hardcoding, because https://tc39.es/ecma402/#sec-internal-slots documents a [[RelevantExtensionKeys]] for each service constructor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

</ul>

<emu-clause id="sec-%segmentsprototype%.containing">
<h1>%SegmentsPrototype%.containing ( _index_ )</h1>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this and not Segmenter.prototype.containing like everywhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containing is not a method of Segmenter instances (the values returned from new Intl.Segmenter(…), whose prototype is %Segmenter.prototype%, which can be accessed as the initial value of Intl.Segmenter.prototype), but rather of Segments instances (the values returned from segmenter.segment(str), whose prototype is the different object %SegmentsPrototype%, for which there is no simple access path available to ECMAScript code).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I mistyped, I meant Segments.prototype.containing, didn't know that this wasn't valid.

A Segments instance is an object that represents the segments of a specific string, subject to the locale and options of its constructing Intl.Segmenter instance.
</p>

<emu-clause id="sec-createsegmentsobject" aoid="CreateSegmentsObject">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit unusual that this AO it placed here, away from all other AOs, is there a strong reason to prefer this over the opposite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a lot of value in positioning Create<Type> AOs as the first subsection of their <Type> section and following them with the description of the prototype they use, supporting a linear reading of the specification.

Copy link
Contributor Author

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like to switch to the new structured arguments syntax?

Yes, but I think I'd prefer that change after merge, to minimize divergence from the text of the proposal (which preceded structured headers).

The AOs are interleaved all over the spec and I don't see a compelling reason for that. Would you mind putting the AO section at the top with all the AOs in it?

How strongly do you feel about this? I have a fairly strong preference for this pattern (with the compelling reason being "better readability for linear spec consumption"), and would like to update the rest of ECMA-402 accordingly as soon as possible. But I'm willing to revert this to the existing pattern if you think that will be contentious or take too long.

Comment on lines +20 to +21
1. Let _internalSlotsList_ be &laquo; [[InitializedSegmenter]], [[Locale]], [[SegmenterGranularity]] &raquo;.
1. Let _segmenter_ be ? OrdinaryCreateFromConstructor(NewTarget, *"%Segmenter.prototype%"*, _internalSlotsList_).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're referring internalSlotsList, it's for the value of naming what is otherwise a potentially confusing argument (especially when the list of slots exceeds three or four items) using a convention that is already fairly common throughout ECMA-262 and ECMA-402.

I would actually support a followup making it universal in ECMA-402, ideally using a table-based approach like OrdinaryFunctionCreate and BoundFunctionCreate and ModuleNamespaceCreate:

Suggested change
1. Let _internalSlotsList_ be &laquo; [[InitializedSegmenter]], [[Locale]], [[SegmenterGranularity]] &raquo;.
1. Let _segmenter_ be ? OrdinaryCreateFromConstructor(NewTarget, *"%Segmenter.prototype%"*, _internalSlotsList_).
1. Let _internalSlotsList_ be the internal slots listed in <emu-xref href="#table-segment-iterator-instance-slots"></emu-xref>.
1. Let _segmenter_ be ? OrdinaryCreateFromConstructor(NewTarget, *"%Segmenter.prototype%"*, _internalSlotsList_).

</p>

<p>
The value of the [[RelevantExtensionKeys]] internal slot is &laquo; &raquo;.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better than hardcoding, because https://tc39.es/ecma402/#sec-internal-slots documents a [[RelevantExtensionKeys]] for each service constructor.

A Segments instance is an object that represents the segments of a specific string, subject to the locale and options of its constructing Intl.Segmenter instance.
</p>

<emu-clause id="sec-createsegmentsobject" aoid="CreateSegmentsObject">
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a lot of value in positioning Create<Type> AOs as the first subsection of their <Type> section and following them with the description of the prototype they use, supporting a linear reading of the specification.

</ul>

<emu-clause id="sec-%segmentsprototype%.containing">
<h1>%SegmentsPrototype%.containing ( _index_ )</h1>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containing is not a method of Segmenter instances (the values returned from new Intl.Segmenter(…), whose prototype is %Segmenter.prototype%, which can be accessed as the initial value of Intl.Segmenter.prototype), but rather of Segments instances (the values returned from segmenter.segment(str), whose prototype is the different object %SegmentsPrototype%, for which there is no simple access path available to ECMAScript code).

</emu-clause>
</emu-clause>

<emu-clause id="sec-intl-segmenter-abstracts">
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with CreateSegmentsObject, I think positioning operations below their use makes for better (specifically more linear) consumption of the spec. I'm willing to change this, but would rather update other sections to follow its pattern. Thoughts?

@ryzokuken
Copy link
Member

@gibson042

How strongly do you feel about this? I have a fairly strong preference for this pattern (with the compelling reason being "better readability for linear spec consumption"), and would like to update the rest of ECMA-402 accordingly as soon as possible. But I'm willing to revert this to the existing pattern if you think that will be contentious or take too long.

Honestly, I'm not sure anymore. Your rationale sounds great to me, but there's also the benefit of keeping things organized in a single location, maybe this is especially notable here because the proposal introduces three types simultaneously. Perhaps the better judge of this would be the implementers?

@FrankYFTang @Constellation does this new pattern of organizing AOs seem helpful to you? Do you think it correctly corresponds to how you consume the spec? If so, I have no reservations against it.

@gibson042 should we also solicit feedback from the 262 editors here?

@gibson042
Copy link
Contributor Author

should we also solicit feedback from the 262 editors here?

Possibly, but 262 itself is inconsistent (e.g., Map follows the pattern I'm proposing here but ArrayBuffer is more like the rest of 402, and WeakRef is in between them).

@sffc
Copy link
Contributor

sffc commented Jan 13, 2022

@ryzokuken
Copy link
Member

@gibson042 can you please make a PR for the editorial changes to 402 so I can merge this?

@gibson042
Copy link
Contributor Author

gibson042 commented Jan 24, 2022

@ryzokuken done: #650

@ryzokuken ryzokuken merged commit aad9a02 into tc39:master Jan 24, 2022
tidoust added a commit to w3c/browser-specs that referenced this pull request Jan 25, 2022
Proposal moved to stage 4, and was merged in Ecma402:
tc39/proposals@24bdab7
tc39/ecma402#553
tidoust added a commit to w3c/browser-specs that referenced this pull request Jan 25, 2022
* Drop Intl.displayNames v2 proposal, merged in Ecma402

Proposal reached stage 4, was integrated in Ecma402 and now returns a 404:
tc39/proposals@f4378e1
tc39/ecma402#622

* Drop Intl.Segmenter proposal, merged in Ecma402

Proposal moved to stage 4, and was merged in Ecma402:
tc39/proposals@24bdab7
tc39/ecma402#553
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Intl.breakIterator
4 participants