Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Language / Error model: Improve spellchecking #7

Open
2 of 3 tasks
crazoter opened this issue May 3, 2022 · 3 comments
Open
2 of 3 tasks

[Feature]: Language / Error model: Improve spellchecking #7

crazoter opened this issue May 3, 2022 · 3 comments
Labels
enhancement New feature or request spellchecking

Comments

@crazoter
Copy link
Owner

crazoter commented May 3, 2022

NLP can be used for two cases:

  1. Word suggestions (predicting the next word) e.g. distilbert (potentially tricky as NLP models predict the next token, not the next word. Generating a list of "potential" words will need some engineering.
  2. Word corrections (can also complement word suggestions) e.g. textdistance (https://github.com/life4/textdistance). https://stackoverflow.com/questions/1661434/algorithm-wanted-find-all-words-of-a-dictionary-that-are-similar-to-words-in-a or thefuzz https://github.com/seatgeek/thefuzz with the introduction of https://github.com/wolfgarbe/SymSpell, this has been included.

The idea is that word correction can correct the word, but not take the context into account, thus correcting incorrectly typed words wrongly e.g. "i was quit" --> quit instead of quite.

This of course requires the pre-requisites:

  1. that I record the context while typing out the sentence.
  2. that the words are not prematurely filtered out (this will likely affect performance)

The feature of allowing users to input multiple keys which may not be in correct order also poses an interesting problem which will complicate the search process.

The current solution is simple and efficient but limited in simplifying keystrokes.

The solution will need to balance performance, quality of life and computation resources.

Tasks:

  • Add contextual spell checking
  • Change dictionary to use word frequency dictionary
  • Involve frequency in the ranking of words

reference: #7 (comment)

@crazoter crazoter added the enhancement New feature or request label May 3, 2022
@crazoter crazoter changed the title [Feature]: Introduce NLP [Feature]: Introduce NLP to spellcheck with context May 6, 2022
@crazoter crazoter changed the title [Feature]: Introduce NLP to spellcheck with context [Feature]: Introduce deep learnt NLP to spellcheck with context May 7, 2022
@crazoter crazoter changed the title [Feature]: Introduce deep learnt NLP to spellcheck with context [Feature]: Introduce spellchecking with context May 7, 2022
@crazoter
Copy link
Owner Author

crazoter commented May 7, 2022

@crazoter crazoter changed the title [Feature]: Introduce spellchecking with context [Feature]: Language / Error model: Introduce spellchecking with context May 7, 2022
@crazoter crazoter changed the title [Feature]: Language / Error model: Introduce spellchecking with context [Feature]: Language / Error model: Improve spellchecking May 7, 2022
@crazoter
Copy link
Owner Author

crazoter commented May 7, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request spellchecking
Projects
None yet
Development

No branches or pull requests

1 participant