-
Notifications
You must be signed in to change notification settings - Fork 0
Libraries
Mikhail Koltsov edited this page Nov 5, 2016
·
2 revisions
- Enchant for spellchecking. Use this trick to add russian language. MacOS works fine, Ubuntu has troubles;
- NLTK for minor utility features (like stopwords). Unfortunately, NLTK does not provide clever word- and sentence-tokenization for russian language (at least, we did not find it or it required 2Gb+ hard drive);
- pymystem3, which is a wrapper around Mystem. Useful for POS tagging and lemmatization;
- some synonym dictionary.
- sklearn - a swiss army knife of machine learning. Has dozens of classifiers, metrics and helper functions;
- matplotlib - useful for visualizing ML-related data.
- Freeling: supposed to solve many NLP problems, particularly for russian language. Has 3Gb+ requirements for hard drive + scary installation routine;
- pymorphy2 - superseded by Mystem;
- pyrus and opencorpora. We thought about employing syntactic analysis as a machine learning feature. And then we forgot why and how we could use it.