Skip to content

Libraries

Mikhail Koltsov edited this page Nov 5, 2016 · 2 revisions

Used libraries

Text libraries

  • Enchant for spellchecking. Use this trick to add russian language. MacOS works fine, Ubuntu has troubles;
  • NLTK for minor utility features (like stopwords). Unfortunately, NLTK does not provide clever word- and sentence-tokenization for russian language (at least, we did not find it or it required 2Gb+ hard drive);
  • pymystem3, which is a wrapper around Mystem. Useful for POS tagging and lemmatization;
  • some synonym dictionary.

Machine learning libraries

  • sklearn - a swiss army knife of machine learning. Has dozens of classifiers, metrics and helper functions;
  • matplotlib - useful for visualizing ML-related data.

Tried to, but unused libraries

  • Freeling: supposed to solve many NLP problems, particularly for russian language. Has 3Gb+ requirements for hard drive + scary installation routine;
  • pymorphy2 - superseded by Mystem;
  • pyrus and opencorpora. We thought about employing syntactic analysis as a machine learning feature. And then we forgot why and how we could use it.