Skip to content
This repository has been archived by the owner on Feb 2, 2021. It is now read-only.

Better handling of invalid UTF8 strings. #664

Merged
merged 1 commit into from
Feb 9, 2016
Merged

Conversation

ExtremeMan
Copy link
Contributor

As it appears iconv could fail and set errno to EINVAL even if ICONV_SET_DISCARD_ILSEQ is set [1]. In that case xctool should skip invalid UTF8 bytes [2], replace every invalid sequence of bytes with a replacement character [3].

In this PR StringFromDispatchDataWithBrokenUTF8Encoding was updated to handle use cases described above. If errno is EINVAL then function skips all bytes until possible start of new sequence is detected. Then iconv is used again to read them. At the end new string is composed from valid UTF8 substring and returned to a caller.

Fixes #662.

Ref:
[1] https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/iconv.3.html
[2] https://en.wikipedia.org/wiki/UTF-8
[3] https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character

@ExtremeMan
Copy link
Contributor Author

@yiding, addressed your feedback.

@ExtremeMan
Copy link
Contributor Author

Using initWithBytesNoCopy:length:encoding:freeWhenDone: as suggested for efficiency.

ExtremeMan added a commit that referenced this pull request Feb 9, 2016
Better handling of invalid UTF8 strings.
@ExtremeMan ExtremeMan merged commit 9cdb3bc into master Feb 9, 2016
@ExtremeMan ExtremeMan deleted the nekto/utf8_handling branch February 9, 2016 00:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants