Analysis
Computer Vision
Computer Vision is a field of artificial intelligence that teaches machines how to interpret and understand visual information from the world, similar to how humans perceive and comprehend images and videos. It involves enabling computers to analyze, recognize, and make decisions based on visual data.
A section of computer vision intersects with Neural Networks in terms of CNNs. Convolutional Neural Networks are a specialized class of deep learning algorithms designed for processing grid-like data, particularly images. They have found widespread applications in computer vision due to their ability to automatically learn hierarchical features from visual data. Â
There are several notable applications of CNNs which are used in this project.
Applications of Convolution Neural Networks
Â
CNNs have proven to be versatile and potent tools, contributing significantly to advancements in computer vision across various industries and applications. Sometimes, people confuse between Object Detection vs Image Classification vs Image Segmentation. Some are:
Image Classification: Identifying and categorizing objects within images, such as recognizing whether an image contains a cat and which pixels contains this cat.
Image Segmentation: Instead of looking at the whole image, we divide it into different parts, and each part gets its own color or label. It helps computers understand and recognize different objects or areas in a picture.
Semantic Segmentation: Segment an image into regions that correspond to different objects or structures and assign a semantic label to each region. This means that pixels belonging to the same class or category (e.g., person, car, road, tree) are grouped together, providing a detailed understanding of the content within the image.
Object Detection: Locating and classifying multiple objects within an image, providing information about their presence and spatial arrangement.
Let's have a look at how different these are:-
Image Classification
Image Segmentation
Semantic Segmentation
Object Detection
Project Architecture
PokerMate aims to create an intelligent tool that not only can understand or detect an image as one or more of the 52 playing cards in a valid deck but also guide you for potential options you may have with those cards. The project achieves this goal in three stages: Classification, Detection and an Interface, in-order.
Stages of Modelling
The first stage refers to image classification :-
It involves taking an input image, and categorizing it into one of the playing cards.In the second stage, the capabilities were enhanced by implementing object detection :-
Consuming data which contains two or more than two cards in one image and detecting all the relevant cards present within the image. The difference between the two stages apart from the input data preparation was the algorithm being disposed.The final stage of this project ended with an user interface :-
The final stage involved building an intutive UI for users to interact with, upload their images and visualize results on the fly in realtime.Â
Stage 1 | Image Classification
To start simple: image classification marks the initial stage. The goal of this stage was to develop an image classification model capable of recognizing playing cards using Convolutional Neural Networks (CNNs). The dataset consists of images depicting various playing cards, each labeled with its corresponding card type.Â
The dataset was divided into three sets: training, validation, and test sets with each set containing the pixel information of R,G,B input images.
The labels were derived from the file names, where each label represented a card, such as '10_C' for 10 of clubs and 'A_S' for Ace of Spades.Â
To facilitate model training, these labels were integer-encoded using simple label encoder, and thus the final label count was 52 (0, 1, 2, ...., 51).
MODEL
Using Keras, a convolution neural network was constructed to categorize given images of a playing cards. The input for this model would be an image file, presumably containing one card. And the output of the model would be a label representing one of the 52 valid cards.
Input (Sample image of a playing card)        - - - - >    Model   - - - - - >      Output (Ace of Spades)
Sample Model Classification Outputs
Hearts of Jack
Hearts of 7
Clubs of Queen
Clubs of Ace
Stage 2 | Object Detection
After training a basic model to identify playing cards, the project progressed to the task of recognizing multiple cards within a single image, a crucial step for identifying poker hands consisting of 2 + 5 cards. This stage involved object detection.
To enhance accuracy, the ante was raised, incorporating object detection through transfer learning. This entailed leveraging pre-trained weights from a popular object detection Base Nano model, YOLO Version 8.0, originally trained on everyday objects like dogs and cats. By fine-tuning and freezing specific layers, the model was then trained on the project's customized dataset. This process effectively transferred the learned configuration to the project's training inputs of detecting custom playing cards.
MODEL
After fine-tuning, the customized model became capable of detecting (rather than just classifying) a given image containing one or more cards, and inferencing them into respective cards labels.
     Input (Image of a one/more cards)       - - - - >    Model    - - - - - >     Output (Ace Spades, King Clubs, 10 Hearts)
Sample Model Detection Outputs
Stage 3 | User Interface
The project involved establishing an underlying logic, followed by the development of a user-friendly interface using Streamlit. This interface includes a straightforward design with an upload option, enabling users to submit images of their poker hands.
After image upload, the model undergoes a sequence of processes involving image classification to assess the overall suitability of the image, followed by object detection to identify specific card labels. The next step involves the business logic, which computes user's poker hand strength. Once completed, the user interface presents the output for users to consider in their decision-making process.
Voila! Never a dull moment at the Poker table!