Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringIndexError when trying to create a StringDocument based on a UTF8 string #255

Closed
alexzandros opened this issue Apr 26, 2021 · 3 comments

Comments

@alexzandros
Copy link

I'm trying to create a StringDocument based on a string that contains utf-8 characters, and all i'm getting is a StringIndexError

My code is as follows

str = "Lo que tengamos que hacer, apoyar, enteegar el ❤️ y el alma por nuestro país. Ivan es el Man. 👏👏👏#Duquepresidente https://t.co/Dr1LdTa5yQ"
sd = StringDocument(str)

And I get the following error

Error showing value of type StringDocument{String}:
ERROR: StringIndexError: invalid index [50], valid nearby indices [48]=>'❤', [51]=>'️'

Followed by a stack trace.

So, I need to know what is the best practice for working with utf strings.

Thanks in advance.

@aviks
Copy link
Member

aviks commented May 5, 2021

Can you paste the stack trace you saw? Looks like a bug on our side.

@segunolulana
Copy link

segunolulana commented Jun 21, 2022

I also experienced the same issue. The text in question contains Less likely working with code I don’t like and the stacktrace is

ERROR: LoadError: StringIndexError: invalid index [38], valid nearby indices [36]=>'’', [39]=>'t'
Stacktrace:
  [1] string_index_err(s::String, i::Int64)
    @ Base ./strings/string.jl:12
  [2] SubString{String}(s::String, i::Int64, j::Int64)
    @ Base ./strings/substring.jl:32
  [3] SubString
    @ ./strings/substring.jl:38 [inlined]
  [4] SubString
    @ ./strings/substring.jl:44 [inlined]
  [5] remove_patterns(s::SubString{String}, rex::Regex)
    @ TextAnalysis ~/.julia/packages/TextAnalysis/B0QxG/src/preprocessing.jl:486
  [6] remove_patterns!
    @ ~/.julia/packages/TextAnalysis/B0QxG/src/preprocessing.jl:508 [inlined]
  [7] remove_patterns!(crps::Corpus{StringDocument{SubString{String}}}, rex::Regex)
    @ TextAnalysis ~/.julia/packages/TextAnalysis/B0QxG/src/preprocessing.jl:534
  [8] prepare!(crps::Corpus{StringDocument{SubString{String}}}, flags::UInt32; skip_patterns::Set{AbstractString}, skip_words::Set{AbstractString})
    @ TextAnalysis ~/.julia/packages/TextAnalysis/B0QxG/src/preprocessing.jl:415
  [9] prepare!
    @ ~/.julia/packages/TextAnalysis/B0QxG/src/preprocessing.jl:406 [inlined]
 [10] summarize(d::StringDocument{String}; ns::Int64)
    @ TextAnalysis ~/.julia/packages/TextAnalysis/B0QxG/src/summarizer.jl:22
 [11] main()...

@rssdev10
Copy link
Collaborator

Not reproducible with Julia 1.9 and TextAnalysis 0.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants