Skip to content

Material for MPS Text Mining and Sensor Data Analysis courses

Notifications You must be signed in to change notification settings

blazf/mpsPractice

Repository files navigation

MPS Text Mining and Sensor Data Analysis

Slides from the lectures

Exercises

Text mining:

Sensor Data Analysis

Seminar tasks

Text Mining

Prepare a presentation of the results in a 5-10 page report and 5-10 slides (10-15 minutes) presentation (all in English).

Overall task description:

  • Data loading, cleaning and modeling process
  • Identify how to best evaluate the results given the task
  • Implement it as a RunKit notebook

Assignments:

  • Andraz Repar: Translation memory in OntoGen
  • Matej Martinc: Author profiling on tweets PAN (gender classification, Language variety identification)
  • Blaz Skrlj: Sentiment detection
  • Miha Torkar, Zala Herga: Trend detection on Event Registry data
  • Erik Novak, Klemen Kenda: Novelty detection on event Registry data
  • James Hodson: ?
  • Gregor Grasselli, Tamara Hovhannisyan: Topic classification dataset
  • Gjorgi Peev, Gordana Ispirova: Topic classification dataset

Instructions for topic classification task:

Datasets are provided in the following format:

Goals:

  • Parse your dataset
    • Identify categories and how many articles they have
  • Generate bag-of-words feature space for the dataset
  • Perform classification for two frequent and two rare categorie
    • Identify precision and recall them using cross-validation
  • Find an article on the internet that is positively classified into each of the selected categories

You can use sentiment example as a template on how to load data, create feature space and apply classifier.

Sensor Data Analysis

Implement prediction task

  • Data loading, cleaning and modeling process
  • Identify how to best evaluate the results given the task
  • Implement it as a RunKit notebook

Assignments:

  • Martin Gjoreski: Microsoft Band – predicting stress level
  • Miha Torkar, James Hodson: Finance dataset
  • Klemen Kenda: Smart grid data
  • Zala Herga: credit scoring
  • Erik Novak: BicikeLJ

About

Material for MPS Text Mining and Sensor Data Analysis courses

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published