Skip to content

Research papers

Mikhail Koltsov edited this page Nov 2, 2016 · 4 revisions
  1. Subjectivity Classification using Machine Learning Techniques for Mining Feature-Opinion Pairs from Web Opinion Sources Subjectivity is used as a feature in one spam review research paper.

  2. Survey of review spam detection using machine learning techniques

Interesting ideas:

  • we can use "co-training" (semi-supervised learning): make a dataset of labeled data + big chunk of unlabeled data. Train two classifiers with distinct features (e.g. review-oriented and user-oriented) on labeled data. Make them predict unlabeled data. Look at samples that are most certainly (judging by probability of both classifiers) paid/unpaid. Add such samples to labeled dataset. Rinse and repeat;
  • there were only three classifiers that proved to be performing: SVM, Naive Bayes, Logistic Regression. Other classifiers, ensembles and ideas (like Bagging and Boosting) were not researched (or maybe they proved to be weak?);
  • synthetic review spam is different from "real-world" review spam;
  • there are two main feature types: review-centric and reviewer-centric.

Has a comparison of previous work related to review spam classification: 3. Learning to Identify Review Spam

Clone this wiki locally