Rubbish Classifier - Computer Vision Classification

February 2023

GitHub

PyTorch

Python

Jupyter

Overview

It is known that when it comes to rubbish separation, we only do it properly sometimes. It could be for lack of information about the correct way of doing it or for the laziness of needing someone or something to tell us where to throw it correctly. That's the reason for this application to come to exist.

As a disclaimer, the technology used in this project is at the beginner level, where the idea was to experiment and try out a miniature version of a CNN, to, later on, improve it by applying other technologies such as using a state-of-the-art model for image classification, transfer learning and fine-tune techniques.

Therefore, the idea was to implement a model able to perform a classification through images of nine classes of rubbish:

Cans (soft drinks, beer, etc..)
Carton_boxes (Milk boxes, Juice Boxes, etc...)
Coffee_cups (Takeaway coffee cups)
Glass bottles (Beer bottles, wine bottles, spirits bottles, etc...)
Paper_bags (Shopping paper bags)
Plastic_bags (shopping plastic bags)
Plastic bottles (Milk plastic bottles, water, juices, etc...)
Takeaway_containers (cardboard takeaway containers)
Tissues (tissues, napkins)

But, as specified in the disclaimer, it is a beginner-level implementation to experiment with a computer vision classification. As a result, only the first four classes were used.

Cans (soft drinks, beer, etc..)
Carton boxes (Milk boxes, Juice Boxes, etc...)
Coffee cups (Takeaway coffee cups)
Glass bottles (Beer bottles, wine bottles, spirits bottles, etc...)

The number of layers, parameters and dimensions were extracted from CNN Explainer, which provides a graphic view of the network with all its information.

CNN Explainer

To develop the application, a CNN was trained on self-collected data and developed using PyTorch. The network architecture was based on the Tiny VGG used in the CNN Explainer.

The hyperparameters chosen were the following:

112 images of Cans (soft drinks, beer, etc..)
89 images of Carton boxes (Milk boxes, Juice Boxes, etc...)
169 Coffee cups (Takeaway coffee cups)
121 Glass bottles (Beer bottles, wine bottles, spirits bottles, etc...)
Split of 75% for training and 25% for testing
Input dimensions ([256, 256, 3])
BATCH_SIZE of 32
Two CNN blocks and one classifier block
Each CNN is a sequential block built with the following:
- Conv2d()
- ReLu()
- Conv2d()
- ReLu()
- MaxPool2s()
And for the classifier block:
- Flatten()
- Linear()
NUM_EPOCHS of 20
Loss function used CrossEntropyLoss()
Optimizer used Adam Optimizer
Learning rate of 0.001

The model layer and shapes were the following:

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
TinyVGG                                  [32, 4]                   --
├─Sequential: 1-1                        [32, 10, 126, 126]        --
│    └─Conv2d: 2-1                       [32, 10, 254, 254]        280
│    └─ReLU: 2-2                         [32, 10, 254, 254]        --
│    └─Conv2d: 2-3                       [32, 10, 252, 252]        910
│    └─ReLU: 2-4                         [32, 10, 252, 252]        --
│    └─MaxPool2d: 2-5                    [32, 10, 126, 126]        --
├─Sequential: 1-2                        [32, 10, 61, 61]          --
│    └─Conv2d: 2-6                       [32, 10, 124, 124]        910
│    └─ReLU: 2-7                         [32, 10, 124, 124]        --
│    └─Conv2d: 2-8                       [32, 10, 122, 122]        910
│    └─ReLU: 2-9                         [32, 10, 122, 122]        --
│    └─MaxPool2d: 2-10                   [32, 10, 61, 61]          --
├─Sequential: 1-3                        [32, 4]                   --
│    └─Flatten: 2-11                     [32, 37210]               --
│    └─Linear: 2-12                      [32, 4]                   148,844
==========================================================================================
Total params: 151,854
Trainable params: 151,854
Non-trainable params: 0
Total mult-adds (G): 3.31
==========================================================================================
Input size (MB): 25.17
Forward/backward pass size (MB): 405.20
Params size (MB): 0.61
Estimated Total Size (MB): 430.97
==========================================================================================

And after all the development, the training and test results were the following:

  0%|          | 0/20 [00:00<?, ?it/s]
Epoch: 0 | train_loss: 1.3578 | train_acc: 0.3380 | test_loss: 1.4371 | test_acc: 0.2344
Epoch: 1 | train_loss: 1.3505 | train_acc: 0.3745 | test_loss: 1.4166 | test_acc: 0.2344
Epoch: 2 | train_loss: 1.3462 | train_acc: 0.3745 | test_loss: 1.4127 | test_acc: 0.2344
Epoch: 3 | train_loss: 1.3365 | train_acc: 0.3823 | test_loss: 1.4370 | test_acc: 0.2344
Epoch: 4 | train_loss: 1.3153 | train_acc: 0.4101 | test_loss: 1.4134 | test_acc: 0.3203
Epoch: 5 | train_loss: 1.2436 | train_acc: 0.4605 | test_loss: 1.3471 | test_acc: 0.4089
Epoch: 6 | train_loss: 1.1433 | train_acc: 0.5326 | test_loss: 1.2943 | test_acc: 0.4583
Epoch: 7 | train_loss: 1.0717 | train_acc: 0.5595 | test_loss: 1.3672 | test_acc: 0.4167
Epoch: 8 | train_loss: 0.9518 | train_acc: 0.6047 | test_loss: 1.2811 | test_acc: 0.4792
Epoch: 9 | train_loss: 0.8666 | train_acc: 0.6456 | test_loss: 1.2530 | test_acc: 0.4375
Epoch: 10 | train_loss: 0.7554 | train_acc: 0.6968 | test_loss: 1.4066 | test_acc: 0.4922
Epoch: 11 | train_loss: 0.6378 | train_acc: 0.7655 | test_loss: 1.4421 | test_acc: 0.4818
Epoch: 12 | train_loss: 0.4516 | train_acc: 0.8732 | test_loss: 1.2420 | test_acc: 0.5260
Epoch: 13 | train_loss: 0.3104 | train_acc: 0.9010 | test_loss: 1.3614 | test_acc: 0.5729
Epoch: 14 | train_loss: 0.1869 | train_acc: 0.9375 | test_loss: 1.8353 | test_acc: 0.5365
Epoch: 15 | train_loss: 0.1195 | train_acc: 0.9618 | test_loss: 1.7163 | test_acc: 0.5104
Epoch: 16 | train_loss: 0.0646 | train_acc: 0.9792 | test_loss: 2.0551 | test_acc: 0.5365
Epoch: 17 | train_loss: 0.0387 | train_acc: 0.9948 | test_loss: 2.0905 | test_acc: 0.5859
Epoch: 18 | train_loss: 0.0206 | train_acc: 0.9974 | test_loss: 2.0580 | test_acc: 0.5703
Epoch: 19 | train_loss: 0.0107 | train_acc: 1.0000 | test_loss: 2.2457 | test_acc: 0.5573
Total training time: 942.753 seconds.

As we can see, the test results could be better, the cause is the low complexity of the model and the small amount of data used. As said before, the idea was to part from here and implement a better model with better technologies.

If you feel like going deeper into the project and getting detailed information about its implementation and results, check this notebook.

Rubbish Classifier - Computer Vision Classification

February 2023

GitHub

PyTorch

Python

Jupyter

Overview

As a disclaimer, the technology used in this project is at the beginner level, where the idea was to experiment and try out a miniature version of a CNN, to, later on, improve it by applying other technologies such as using a state-of-the-art model for image classification, transfer learning and fine-tune techniques.

Therefore, the idea was to implement a model able to perform a classification through images of nine classes of rubbish:

Cans (soft drinks, beer, etc..)
Carton_boxes (Milk boxes, Juice Boxes, etc...)
Coffee_cups (Takeaway coffee cups)
Glass bottles (Beer bottles, wine bottles, spirits bottles, etc...)
Paper_bags (Shopping paper bags)
Plastic_bags (shopping plastic bags)
Plastic bottles (Milk plastic bottles, water, juices, etc...)
Takeaway_containers (cardboard takeaway containers)
Tissues (tissues, napkins)

But, as specified in the disclaimer, it is a beginner-level implementation to experiment with a computer vision classification. As a result, only the first four classes were used.

Cans (soft drinks, beer, etc..)
Carton boxes (Milk boxes, Juice Boxes, etc...)
Coffee cups (Takeaway coffee cups)
Glass bottles (Beer bottles, wine bottles, spirits bottles, etc...)

The number of layers, parameters and dimensions were extracted from CNN Explainer, which provides a graphic view of the network with all its information.

CNN Explainer

To develop the application, a CNN was trained on self-collected data and developed using PyTorch. The network architecture was based on the Tiny VGG used in the CNN Explainer.

The hyperparameters chosen were the following:

112 images of Cans (soft drinks, beer, etc..)
89 images of Carton boxes (Milk boxes, Juice Boxes, etc...)
169 Coffee cups (Takeaway coffee cups)
121 Glass bottles (Beer bottles, wine bottles, spirits bottles, etc...)
Split of 75% for training and 25% for testing
Input dimensions ([256, 256, 3])
BATCH_SIZE of 32
Two CNN blocks and one classifier block
Each CNN is a sequential block built with the following:
- Conv2d()
- ReLu()
- Conv2d()
- ReLu()
- MaxPool2s()
And for the classifier block:
- Flatten()
- Linear()
NUM_EPOCHS of 20
Loss function used CrossEntropyLoss()
Optimizer used Adam Optimizer
Learning rate of 0.001

The model layer and shapes were the following:

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
TinyVGG                                  [32, 4]                   --
├─Sequential: 1-1                        [32, 10, 126, 126]        --
│    └─Conv2d: 2-1                       [32, 10, 254, 254]        280
│    └─ReLU: 2-2                         [32, 10, 254, 254]        --
│    └─Conv2d: 2-3                       [32, 10, 252, 252]        910
│    └─ReLU: 2-4                         [32, 10, 252, 252]        --
│    └─MaxPool2d: 2-5                    [32, 10, 126, 126]        --
├─Sequential: 1-2                        [32, 10, 61, 61]          --
│    └─Conv2d: 2-6                       [32, 10, 124, 124]        910
│    └─ReLU: 2-7                         [32, 10, 124, 124]        --
│    └─Conv2d: 2-8                       [32, 10, 122, 122]        910
│    └─ReLU: 2-9                         [32, 10, 122, 122]        --
│    └─MaxPool2d: 2-10                   [32, 10, 61, 61]          --
├─Sequential: 1-3                        [32, 4]                   --
│    └─Flatten: 2-11                     [32, 37210]               --
│    └─Linear: 2-12                      [32, 4]                   148,844
==========================================================================================
Total params: 151,854
Trainable params: 151,854
Non-trainable params: 0
Total mult-adds (G): 3.31
==========================================================================================
Input size (MB): 25.17
Forward/backward pass size (MB): 405.20
Params size (MB): 0.61
Estimated Total Size (MB): 430.97
==========================================================================================

And after all the development, the training and test results were the following:

  0%|          | 0/20 [00:00<?, ?it/s]
Epoch: 0 | train_loss: 1.3578 | train_acc: 0.3380 | test_loss: 1.4371 | test_acc: 0.2344
Epoch: 1 | train_loss: 1.3505 | train_acc: 0.3745 | test_loss: 1.4166 | test_acc: 0.2344
Epoch: 2 | train_loss: 1.3462 | train_acc: 0.3745 | test_loss: 1.4127 | test_acc: 0.2344
Epoch: 3 | train_loss: 1.3365 | train_acc: 0.3823 | test_loss: 1.4370 | test_acc: 0.2344
Epoch: 4 | train_loss: 1.3153 | train_acc: 0.4101 | test_loss: 1.4134 | test_acc: 0.3203
Epoch: 5 | train_loss: 1.2436 | train_acc: 0.4605 | test_loss: 1.3471 | test_acc: 0.4089
Epoch: 6 | train_loss: 1.1433 | train_acc: 0.5326 | test_loss: 1.2943 | test_acc: 0.4583
Epoch: 7 | train_loss: 1.0717 | train_acc: 0.5595 | test_loss: 1.3672 | test_acc: 0.4167
Epoch: 8 | train_loss: 0.9518 | train_acc: 0.6047 | test_loss: 1.2811 | test_acc: 0.4792
Epoch: 9 | train_loss: 0.8666 | train_acc: 0.6456 | test_loss: 1.2530 | test_acc: 0.4375
Epoch: 10 | train_loss: 0.7554 | train_acc: 0.6968 | test_loss: 1.4066 | test_acc: 0.4922
Epoch: 11 | train_loss: 0.6378 | train_acc: 0.7655 | test_loss: 1.4421 | test_acc: 0.4818
Epoch: 12 | train_loss: 0.4516 | train_acc: 0.8732 | test_loss: 1.2420 | test_acc: 0.5260
Epoch: 13 | train_loss: 0.3104 | train_acc: 0.9010 | test_loss: 1.3614 | test_acc: 0.5729
Epoch: 14 | train_loss: 0.1869 | train_acc: 0.9375 | test_loss: 1.8353 | test_acc: 0.5365
Epoch: 15 | train_loss: 0.1195 | train_acc: 0.9618 | test_loss: 1.7163 | test_acc: 0.5104
Epoch: 16 | train_loss: 0.0646 | train_acc: 0.9792 | test_loss: 2.0551 | test_acc: 0.5365
Epoch: 17 | train_loss: 0.0387 | train_acc: 0.9948 | test_loss: 2.0905 | test_acc: 0.5859
Epoch: 18 | train_loss: 0.0206 | train_acc: 0.9974 | test_loss: 2.0580 | test_acc: 0.5703
Epoch: 19 | train_loss: 0.0107 | train_acc: 1.0000 | test_loss: 2.2457 | test_acc: 0.5573
Total training time: 942.753 seconds.

If you feel like going deeper into the project and getting detailed information about its implementation and results, check this notebook.

Rubbish Classifier - Computer Vision Classification

Project screenshots

Feel free to check them out.