Skip to content

LafouCC/Machine-Learning-Final-Competition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Final Competition

Task Overview 😊

Develop a model to classify song snippets into one of four categories:

  • No voices present
  • Male voice present
  • Female voice present
  • More than one person's voice

Datasets

  • Train Set: 11886 song snippets with manual labels
  • Test Set: 2447 song snippets

Build-from-Scratch Models Overview

  • 4-Layer CNN
  • ResNet 34
CNN Model ResNet Model

Improvements

  • Data Pre-processing 1
    • Transform mp3 data into Mel spectrogram
  • Data Augmentation
    • Time Mask / Frequency Mask / Reverse
    • Mix-up
    • Up-sampling based on labels' weight
    • Human voice separation
  • Model Improvement
    • Change the Input layer of ResNet 34
    • Ensembling
  • Choice of Hyper-Parameters / Scheduler
Model Scheduler Batch Size Initial lr Mixup Alpha Model Weights
CNN 4-layers MultiStepLR 16 0.001 0.4 -
ResNet 34 ReduceLROnPlateau 16 0.005 0.4 -
ResNet34 + CNN - - - - 0.5, 0.5

Performance

Model Epochs Train Loss Validation Loss Validation Accuracy Test Accuracy
CNN 4-layers 30 0.3828 0.3531 85.79% 76.91%
ResNet 34 24 0.4228 0.2814 88.52% 78.17%
ResNet34 + CNN - - - - 79.44%

Requirements

pip install torch
pip install librosa

Footnotes

  1. See the PDF for details.

Releases

No releases published

Packages

No packages published