Fashion Mnist

Exploratory Data Analysis + Data Visualization + Deep Learning Modelling

1 - Abstract

In this project I made Exploratory Data Analysis, Data Visualisation and lastly Modelling. Fashion Mnist Dataset contains 70000 row in two files. Each example is 28x28 image and associated with 10 labels(targets). After examining the dataset I made data preprocessing for reshaping columns from 784 to (28,28,1) and save the target feature as a seperate vector. In modelling part, with a sequential model with multiple convolution layers with 50 Epochs for training the data. For prediction overfitting and underfitting I adjust Dropout Layers. Overally, model gives 0.9236 accuracy. Furthermore with Data augmentation and/or incresing data size can be helpful for taking better result.

2 - Data

Fashion-MNIST is a dataset of Zalando's article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 labels.Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255. The training and test data sets have 785 columns. The first column consists of the class labels (see above), and represents the article of clothing. The rest of the columns contain the pixel-values of the associated image.

To locate a pixel on the image, suppose that we have decomposed x as x = i * 28 + j, where i and j are integers between 0 and 27. The pixel is located on row i and column j of a 28 x 28 matrix.
For example, pixel31 indicates the pixel that is in the fourth column from the left, and the second row from the top, as in the ascii-diagram below.
Each row is a separate image.
Column 1 is the class label.
Remaining columns are pixel numbers (784 total).
Each value is the darkness of the pixel (1 to 255).

Each training and test example is assigned to one of the following labels:

0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot

Train Dataset Example

Test Dataset Example

3 - Exploratory Data Analysis

Firstly, I checked data, which came two different dataset which are train and test. Later I checked distribution of labels in datasets and I create a list for expressing images for both datasets, moreover I see all the classes(labels) equally distributed. So I dont need to do Oversampling or Undersampling.

4 - Data Preprocessing

For preparing datasets to the model I made data processing which is reshaping columns from (784) to (28,28,1), and for seperate vector I save label feature then process test and train data. After that I split train set into train and validation dataset. Validation set contains %30 of original train dataset and split will be 0.7/0.03. Later this process I controlled distribution of labels in train dataset and validation dataset.

Number of Items in Each Class in Dataset

Number of Items in Each Class in Validation Dataset

5 - Modelling

I used Sequential model. The sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor. Then I add Conv2D layer, MaxPooling2D, Flatten and Dense. For each layer I used these parameters.

1.Conv2D

filters = 32
kernel_size = (3,3)
activation function = relu
kernel_initializer = normal
input_shape = (28,28,1)

2.MaxPooling2D

pool_size = (2,2)

3.Conv2D

filters = 64
kernel_size = (3,3)
activation function = relu

4.Flatten

A flatten operation on a tensor reshapes the tensor to have the shape that is equal to the number of elements contained in tensor non including the batch dimension and doesnt need any parameters.

5.Dense

In first Dense Layer,

units = 128
activation function = relu

In second Dense Layer,

units = 10
activation function = softmax

Finally I am compiling model according these parameters,

loss = categorical cross entrophy
optimizer = adam
metrics = accuracy

6 - Result & Future Work

As a result, my model gives overally good results.

Accuracy of the Model

Loss of the Model

Test Loss is 0.2166

Test Accuracy is 0.9236

Classification Report

Correctly Predicted Items

Falsely Predicted Items

The Best accuracy is for Trousers(Class 1), Sandals(Class 5) with 0.99 and worst accuracy is Shirt(Class 6) with 0.78.

The Best recall is for Trousers(Class 1), with 0.99 and worst recall is Shirt(Class 6) with 0.79

The Best F-1 Score is for Trousers(Class 1) with 0.99 and worst F-1 Score is Shirt(Class 6) with 0.78

For better results, data augmentation can be implemented or data size can be expandable.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
images		images
Fashion Mnist .ipynb		Fashion Mnist .ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fashion Mnist

Exploratory Data Analysis + Data Visualization + Deep Learning Modelling

1 - Abstract

2 - Data

3 - Exploratory Data Analysis

4 - Data Preprocessing

5 - Modelling

6 - Result & Future Work

About

Releases

Packages

Languages

HalukSumen/ImageClassification_FashionMnist

Folders and files

Latest commit

History

Repository files navigation

Fashion Mnist

Exploratory Data Analysis + Data Visualization + Deep Learning Modelling

1 - Abstract

2 - Data

3 - Exploratory Data Analysis

4 - Data Preprocessing

5 - Modelling

6 - Result & Future Work

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages