Introduction
In the process of working on MeowTalk, one of the best applications for cat language translation, the ML team of Akvelon thought of some methods to add new features and improve the service quality. One of the ideas was to use the information about cat breeds. Obtaining them through cat voices didn’t look like a very good idea, so we decided to go with images, and thus the idea of the Cat Scanner was born: an application that uses an image of a cat to detect its breed.
Solution
The problem was to find a cat and get its breed at the same time. We decided to use a different model for that task, so that one model answers the question “Is there a cat on this image at all?”, and the other “If there is a cat, what breed does it have?”.
For both of the subtasks we took a pretrained model of ResNet-50 architecture. For the detector, we left it as is: the original outputs do not contain unified “cat” class, but instead contain a few cat breeds, namely “Tabby”, “Tiger”, “Persian”, “Siamese”, and “Egyptian”. If model prediction fell under one of the above classes, we assume that the cat is detected. Of course, this is not how a real detector should be built, so there is a space for future improvements.
At the classifier side, our team used the well-known Transfer Learning approach. In essence, this is a way of applying information gained while solving one problem to solving a different but similar problem. From model point of view, we removed its final FC (fully-connected) layer (with output_dim=1000
) and replaced it with the composition of three FC-layers with RELU activation functions in between (with final output_dim=12
for number of defined breeds). The model was then trained on the Oxford dataset for cats and dogs breeds classification. We took only the cats part, which left us with 12 cat breeds: “Abyssinian”, “Bengal”, “Birman”, “Bombay”, “British Shorthair”, “Egyptian Mau”, “Maine Coon”, “Persian”, “Ragdoll”, “Russian Blue”, “Siamese”, and “Sphynx”.
Then, in the model initialization we can simply do:
Full training code is provided below:
One more thing to mention here is what do we do with cats of mixed breeds? Using the above approach, they’ll just be classified as one of the most likely breed, which is not what we would expect to see. But the last layer outputs logits for all classes, not only for the most probable one, so we can measure both the model confidence and breeds contribution by applying softmax
function over the output tensor. That’s exactly what we did. We then sorted the results by score (converted to percentage) in descending order and applied a 5% threshold.
Results
Cat Scanner web application that is free to use and available to everyone can be found here: http://cat-scanner.k8s.akvelon.net/
The classification model achieved the following metrics:
- Accuracy: 0.84
- Micro Precision: 0.84
- Micro Recall: 0.84
- Micro F1-score: 0.84
- Macro Precision: 0.85
- Macro Recall: 0.84
- Macro F1-score: 0.83
- Weighted Precision: 0.85
- Weighted Recall: 0.84
- Weighted F1-score: 0.83
There’s still some room for improvement: the detector can be made properly, more augmentations can be added, more breeds can be added to the classifier, and so on. We believe this to be an interesting task to try on your own.
We would like to thank Ilya Polishchuk for his part in the development of our Cat Scanner application.
This article was originally published on Medium by Dmitry Astankov, a Data Scientist at Akvelon.