You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @aviks and @oxinabox Statical tokenizers are used in lot of Transformer based models including BERT family becasue of their ablity to tackle out of vocabulary problem.
After going through Tokenizers in WordTokenizers.jl which i think is pretty good and fast and it will be great if we can built statical tokenizers like BPE, unigram language models etc. on top of it.
I have gone through the following papers- BPE unigram language model
any suggestions how to proceed ?
Where should we keep it in TextAnlaysis.jl or WordTokenizers.jl ?
The text was updated successfully, but these errors were encountered:
Hi @aviks and @oxinabox
Statical tokenizers are used in lot of Transformer based models including BERT family becasue of their ablity to tackle out of vocabulary problem.
After going through Tokenizers in
WordTokenizers.jl
which i think is pretty good and fast and it will be great if we can built statical tokenizers like BPE, unigram language models etc. on top of it.I have gone through the following papers-
BPE
unigram language model
any suggestions how to proceed ?
Where should we keep it in
TextAnlaysis.jl
orWordTokenizers.jl
?The text was updated successfully, but these errors were encountered: