feat: Add basic text support on Unix #362

DataTriny · 2024-02-29T23:05:13Z

Things worth noting:

There is the org.a11y.atspi.EditableText interface but Chromium does not implement it apparently and it mostly send action requests we don't support anyway.
GetBoundedRanges is missing, Chromium does not implement this function either.
The scroll_type argument of ScrollSubstringTo is ignored: our ActionRequest struct does not embed enough information to make use of it. We could compute a point and scroll to it but that may be too much responsibility.
The text-bounds-changed signal is never emitted: noone seem to implement it and Orca doesn't listen to it.

mwcampbell · 2024-03-01T14:47:12Z

About the text insertion and removal events, will it help if we refactor accesskit_consumer to go back to complete copy-on-write tree snapshots, using immutable-chunkmap this time? I think I can justify doing that as part of my Newton work.

Orca doesn't actually call any of the methods on the EditableText interface. In a few places, it does check whether the object supports that interface, but only as a way of determining whether the object is editable text; in other places, Orca does the same thing by checking the state. We could fix Orca to be consistent about that, though Joanie already branched it for GNOME 46. In any case, if Chromium doesn't implement EditableText at all, I think it's acceptable for us to do the same. If Chromium claims to support the interface but has a stub implementation, we should do that.

DataTriny · 2024-03-01T16:27:27Z

if Chromium doesn't implement EditableText at all, I think it's acceptable for us to do the same

No, Chromium doesn't even have a stub implementation, and I couldn't find any Chromium specific workaround for this in particular inside Orca.

will it help if we refactor accesskit_consumer to go back to complete copy-on-write tree snapshots, using immutable-chunkmap this time?

This would allow us to implement the same algorithm as Chromium, only building the entire text content for nodes where it is necessary. It should be noted though that there is this comment in said algorithm:

// TODO(accessibility) Plumb through which part of text changed so we don't
// have to guess what changed based on character differences. This can be
// wrong in some cases as follows:
// -- EDITABLE --
// If editable: when part of the text node changes, assume only that part
// changed, and not the entire thing. For example, if "car" changes to
// "cat", assume only 1 letter changed. This code compares common characters
// to guess what has changed.
// -- NOT EDITABLE --
// When part of the text changes, assume the entire node's text changed. For
// example, if "car" changes to "cat" then assume all 3 letters changed.
// Note, it is possible (though rare) that CharacterData methods are used to
// remove, insert, replace or append a substring.

Which I think asks for more text events to be emitted by the renderer process. So the proper way to solve this would probably be #6.

mwcampbell · 2024-03-07T12:46:37Z

In the new methods for converting to and from global character indices, we shouldn't use str::chars().count(). Instead, we should look at the length of the inline text box's character_lengths list. That way, we consistently use the application/toolkit's definition of "character", taking into account things like multiple code points that combine into a single glyph (e.g. some emoji), without having to add a heavy Unicode support library into AccessKit itself.

mwcampbell · 2024-03-07T12:48:51Z

I suggest adding a stub implementation of attribute_run to PlatformNode in accesskit_atspi_common, and calling that from get_attribute_run in accesskit_unix. We already know what the API will be. By putting the stub implementation in accesskit_atspi_common, we can go ahead and write the glue code in accesskit_unix and newton_atspi_compat, and only have to change one place when we actually implement the feature.

mwcampbell · 2024-03-07T12:55:58Z

I suggest adding a new internal helper function to PlatformNode, called something like resolve_for_text, which checks that the node supports the text interface before calling the closure, and returns the UnsupportedInterface error if it doesn't. This error checking is only really needed for newton_atspi_compat, but doesn't really hurt accesskit_unix.

mwcampbell · 2024-03-07T15:38:22Z

Actually, I was wrong about the conversion between AT-SPI offsets and AccessKit text positions/ranges. AT-SPI offsets, including the CharacterCount property, are in Unicode scalar values, also called code points (not to be confused with code units). So your conversion functions were correct to use str::chars().count() where they do, but they also have to convert between AccessKit character indices (within an inline text box) and USV indices, just as the UTF-16 conversion functions convert between AccessKit character indices and UTF-16 indices. And to make it absolutely clear what these functions do, I'd suggest the names to_global_usv_index and text_position_from_global_usv_index. Once my PR modifying the text tests is merged, you'll want to rebase and fix your new tests accordingly.

Note, though, that the AT-SPI "character" granularity does cover a whole character as defined by AccessKit, not necessarily a single Unicode scalar value (code point). So calling forward_to_character_end for that granularity is correct. I verified this by looking at the GTK 4 source code; there, the character granularity is defined as whatever Pango considers a single cursor position, and that's the same definition that AccessKit uses.

Finally, you may notice that the GetCharacterAtOffset AT-SPI method is not very useful in light of combining characters. Of course, it's easy to just return the first code point, as your implementation does. But Orca doesn't use that method, and I wonder if it's worth implementing, or if it should be considered deprecated.

mwcampbell · 2024-03-07T18:42:08Z

That's all of my feedback on this PR for now. We still need to implement insert and delete events, one way or another. The switch to immutable-chunkmap is blocked for now, because I haven't gotten any response to my PRs on that project. As for more precise events, I'm reluctant to add them as I mentioned before, but I will if they're required for full accessibility on Wayland. I wrote a comment on an at-spi2-core issue about this.

DataTriny · 2024-03-09T14:41:38Z

So your conversion functions were correct to use str::chars().count() where they do, but they also have to convert between AccessKit character indices (within an inline text box) and USV indices

Oh yes, I messed up here...

I'd suggest the names to_global_usv_index and text_position_from_global_usv_index.

I was looking for more descriptive names, I'll rename these functions. Thanks!

mwcampbell · 2024-05-16T15:01:54Z

@DataTriny What do you think about my earlier suggestion that we might not need to implement the AT-SPI GetCharacterAtOffset method (and the corresponding functions in accesskit_atspi_common). As I said, Orca doesn't use it, and it's not that useful in light of Unicode combining characters.

DataTriny · 2024-05-16T19:30:35Z

@mwcampbell Thanks for the reminder. I had filed this issue on the at-spi2-core repo but haven't received feedback yet.

DataTriny · 2024-05-19T13:45:37Z

You can help test this feature by running egui-based programs using my unix-text-testing branch.

mwcampbell · 2024-05-19T14:45:01Z

I don't think we actually want to check is_read_only when determining whether to emit a text change event. It could be that the text is read-only to the user, but the application can change it, and in that case we certainly would want to emit change events.

DataTriny · 2024-05-23T17:58:35Z

To be exhaustive, I should mention the existence of the text-bounds-changed signal: AFAIK this one is implemented by noone and Orca doesn't listen to it. Maybe we can revisit this once we get to implement rich text.

mwcampbell · 2024-05-23T18:07:50Z

If no one else emits that signal and Orca doesn't use it, I think we can safely ignore it.

DataTriny · 2024-05-23T18:41:20Z

I'm marking this as ready for review, but I'll have to rebase it on main before you merge it.

mwcampbell · 2024-05-24T09:33:27Z

In emit_text_selection_change, there's currently an if old_node.is_none(), then after that if statement, let old_node = old_node.unwrap();. We can save the duplicate check of the Option's state, and the unused error path in the unwrap, by doing a let-else, where the current content of the if statement goes under the else branch.

mwcampbell · 2024-05-24T09:52:21Z

Currently, node_updated only calls emit_text_change_if_needed if the node is included in the filtered tree. Since InlineTextBox nodes are excluded from the filtered tree, this means that a text change event is only emitted if the outer node changes, e.g. if the text selection changes. But if some text content changes without changing the selection, then a text change event won't be emitted. To fix this, I suggest moving the emit_text_change_if_needed call to the top of node_updated, before we check the filter results. I guess the only possible problem with that would be if there's a reason to emit other property changes before emitting text changes.

Also, a pattern that I'm using in GTK, that I think will become common in more advanced text implementations, is to put InlineTextBox nodes inside GenericContainer nodes under the root node for the text widget. In that case, if a GenericContainer changes, e.g. by adding or removing child InlineTextBox nodes, but there are no changes on InlineTextBox nodes themselves, then we won't emit a text change event. To fix that, I suggest modifying emit_text_change_if_needed to check the filtered parent if the node's role is either InlineTextBox or GenericContainer. Yes, that will result in extra traversals if changes are made to generic containers that aren't part of a text widget. But that shouldn't be too expensive, since we'll immediately check whether the filtered parent is a text widget. And of course, correctness and completeness have to come before optimization. I should check whether I need to make that change in the Windows and macOS adapters as well.

mwcampbell · 2024-05-24T10:04:44Z

Have you tested the text change code with non-ASCII text? I see that common_prefix, common_suffix, total_old_length, total_new_length, old_length, and new_length are all in Unicode scalar values (USVs), but when you take slices of the strings for the content field of the events, I believe the slice ranges have to be in bytes. I don't know the best way to handle that. I guess the start_index and length fields of the events are supposed to be in USVs, as the offset arguments to all the AT-SPI text methods are. So we obviously can't switch away from USVs entirely.

mwcampbell · 2024-05-24T10:17:58Z

In emit_text_selection_change, the caret offset should be calculated using Node::text_selection_focus, as PlatformNode::caret_offset does, not using selection.end(). Remember, Node::text_selection returns a text range in which the start always comes before the end; this design was admittedly influenced by UIA. But the selection focus, which is where the caret is, might not be the end of that range.

mwcampbell · 2024-05-24T10:29:59Z

In PlatformNode::character_extents, we need to check start.is_document_end() before calling start.forward_to_character_end(), as we do in PlatformNode::string_at_offset.

mwcampbell · 2024-05-24T11:10:55Z

I've finished reviewing this PR.

DataTriny · 2024-05-26T10:11:06Z

@mwcampbell Thanks for the review.

I knew text-changed events weren't able to handle multi-byte characters but I forgot to actually fix the algorithm. Now this is done and tested. I think it is reasonably well optimized.

mwcampbell · 2024-05-26T10:23:55Z

Unless I'm missing something, I think it's a mistake to increment prefix_usv_index one more time before breaking out of the loop. It seems to me that at the end of the loop, prefix_usv_index should be the number of common characters. Is there a reason to increment it one extra time?

DataTriny · 2024-05-26T10:39:34Z

The name of the variable probably didn't help the understanding. Really it's the number of unicode scalar values that both strings have in common, hence the start index of the diff.

mwcampbell · 2024-05-26T10:40:53Z

Yes, the new variable name is clearer.

These changes look good, and I have no further suggestions. Please rebase and merge.

…ted methods on it

…, consider generic containers

DataTriny force-pushed the unix-text branch from 1116a11 to ec6629a Compare March 2, 2024 13:32

DataTriny force-pushed the unix-text branch from ec6629a to b4f85db Compare March 10, 2024 17:28

DataTriny force-pushed the unix-text branch from 1afbda6 to 51c93db Compare May 15, 2024 18:19

DataTriny force-pushed the unix-text branch from e443a39 to ecbb618 Compare May 19, 2024 12:11

DataTriny force-pushed the unix-text branch 3 times, most recently from 5ab8958 to 0776bcf Compare May 20, 2024 12:10

DataTriny force-pushed the unix-text branch from 0776bcf to 37af850 Compare May 23, 2024 18:26

DataTriny marked this pull request as ready for review May 23, 2024 18:40

DataTriny added 20 commits May 26, 2024 12:43

feat: Add basic text support on Unix

2aafda1

Fix and rename conversion functions

152e495

Add stub implementation of text attributes related methods

de45a9a

Ensure PlatformNode supports the text interface before calling rela…

f181a75

…ted methods on it

Fix signature of methods returning extents

329fec2

Fixes after rebase

23b2edf

Fix bugs with caret movement and selection changes

c1fbda6

Drop get_character_at_offset method

549a17e

Emit text insertion/removal events

a79d5d2

Ensure text selection and caret events are fired after focus events

d3a6a4c

Credit Chromium for the diffing algorithm

67ca82f

Emit text change events on all text nodes

069bfb6

Mention text edit support in the main README

a23b63b

Fixes after rebase

98f7753

Use let-else inside emit_text_selection_change

bd68191

Actually fire text-changed events when inline textboxes are updated…

bb37d4e

…, consider generic containers

Fix string slicing inside emit_text_change_if_needed_parent

12aa336

Fix caret movement events

e969add

Refactor text range extents computation

cc23e1e

Fix prefix_usv_index, it's actually the index at which the diff starts

bfe3556

DataTriny force-pushed the unix-text branch from 75d8080 to bfe3556 Compare May 26, 2024 10:44

DataTriny merged commit 52540f8 into main May 26, 2024
10 checks passed

DataTriny deleted the unix-text branch May 26, 2024 10:47

mwcampbell mentioned this pull request May 23, 2024

chore: release main #413

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add basic text support on Unix #362

feat: Add basic text support on Unix #362

DataTriny commented Feb 29, 2024 •

edited

Loading

mwcampbell commented Mar 1, 2024

DataTriny commented Mar 1, 2024

mwcampbell commented Mar 7, 2024

mwcampbell commented Mar 7, 2024

mwcampbell commented Mar 7, 2024

mwcampbell commented Mar 7, 2024

mwcampbell commented Mar 7, 2024

DataTriny commented Mar 9, 2024

mwcampbell commented May 16, 2024

DataTriny commented May 16, 2024

DataTriny commented May 19, 2024

mwcampbell commented May 19, 2024

DataTriny commented May 23, 2024

mwcampbell commented May 23, 2024

DataTriny commented May 23, 2024

mwcampbell commented May 24, 2024

mwcampbell commented May 24, 2024

mwcampbell commented May 24, 2024

mwcampbell commented May 24, 2024

mwcampbell commented May 24, 2024

mwcampbell commented May 24, 2024

DataTriny commented May 26, 2024

mwcampbell commented May 26, 2024

DataTriny commented May 26, 2024

mwcampbell commented May 26, 2024

feat: Add basic text support on Unix #362

feat: Add basic text support on Unix #362

Conversation

DataTriny commented Feb 29, 2024 • edited Loading

mwcampbell commented Mar 1, 2024

DataTriny commented Mar 1, 2024

mwcampbell commented Mar 7, 2024

mwcampbell commented Mar 7, 2024

mwcampbell commented Mar 7, 2024

mwcampbell commented Mar 7, 2024

mwcampbell commented Mar 7, 2024

DataTriny commented Mar 9, 2024

mwcampbell commented May 16, 2024

DataTriny commented May 16, 2024

DataTriny commented May 19, 2024

mwcampbell commented May 19, 2024

DataTriny commented May 23, 2024

mwcampbell commented May 23, 2024

DataTriny commented May 23, 2024

mwcampbell commented May 24, 2024

mwcampbell commented May 24, 2024

mwcampbell commented May 24, 2024

mwcampbell commented May 24, 2024

mwcampbell commented May 24, 2024

mwcampbell commented May 24, 2024

DataTriny commented May 26, 2024

mwcampbell commented May 26, 2024

DataTriny commented May 26, 2024

mwcampbell commented May 26, 2024

DataTriny commented Feb 29, 2024 •

edited

Loading