Introduction
We as humans exhibit a strong ability to effectively recognize new patterns. In particular, we observe anything and everything if it is presented with stimuli, people seem to be able to understand new concepts quickly and then recognize variations on these concepts in future percepts.Remembering a face just by one look is a clear example of how intelligent a human being is and what actual intelligence looks like.
While trying to achieve human-like capabilities, machine learning was very effective and successful in various applications such as web search, fraud detection, text generation, and speech and image recognition. However, these algorithms often fail when stressed on making predictions about data that has less supervised information. One particularly interesting task is image classification under the restriction on size of data. i,e we might have only one example of each possible class before we make a prediction. This is called one-shot learning. Where the learning algorithm looks at an image once and learns features to distinguish it from rest. With one-shot learning, an algorithm can identify if it’s Tony stark in a photograph by just looking
one-shot learning can be effectively applied in problems pertaining to Facial recognition. Typically, facial recognition has found its implementation in various industries like Retail, Surveillance, Social media, supply chain, Hospitality, etc. A few of the prominent use cases are:
- Prevent Retail Crime – Face recognition is extensively used to promptly identify known shoplifters, organized retail criminals, or people with a history of fraud entering retail establishments.
- Finding a missing person – Facial recognition techniques like one-shot learning in CCTV systems can considerably aid operators’ efforts by enabling them to add a photo of a missing person and match it with past appearances of that face captured on CCTV system.
- Protect law enforcement – Face recognition can be used in CCTV for surveillance by law enforcement. It can be used by police to track and identify past criminals suspected of perpetrating.
- Identifying people on social media and emotion detection – Identifying and mapping facial expressions on a human face could be another useful tool for social media.
You might be holding a piece of facial recognition algorithm in your hand or you might be reading this blog on a device that has facial recognition.
The Siamese Neural Network
One of the important applications of one-shot learning – Siamese Neural Network is facial recognition. Let us comprehend “how one-shot learning enables facial recognition/classification?
one-shot learning was practically achieved by Siamese networks that are a special type of neural networks, where instead of a model learning to classify its inputs, The network learns the similarity between two points and then differentiates them. Imagine you compliment your friends,” You look far better than him/her”. The Siamese neural network takes into account how far(distance) better looking you are.
In simple words, A Siamese network has two similar/identical neural networks also called sister networks, each taking one of the two input images. The last layers of the two sister networks are then fed to a contrastive loss function, which calculates the similarity/distance between the two images
Fig.1 Siamese Network s (MC.AI)
The primary objective of the Siamese neural network is not to classify input images, but to differentiate between them. A contrastive loss function evaluates how well the sister networks are distinguishing a given pair of images.
Fig.2 Architecture of Siamese Neural- Andrew Ng
The first sister network input is an image, followed by a sequence of feature extraction layers(Convolution, pooling, fully connected layers) and finally, we get a feature Vector f(x1). The vector f(x1) is the encoding of the input (x1). Then, we perform the second operation, by feeding it to the second sister network which is an identical network to the first one to get a different encoding f(x2) of the input (x2).
Then, we calculate the distance d between the encodings f(x1) and f(x2). If the distance d is less than a threshold(hyperparameter), it means that the two images are of the same person, if not, then these are images of different persons. This is how a machine recognizes or in better words differentiates between faces.
Distance function between two encoding :
d(x1,x2) = *Euclidean Distance(f(x1), f(x2))
If x(i) and x(j) are the same person, d(x(i),x(j)) is small
If x(i) and x(j) are the different person, d(x(i),x(j)) is large
To get a good encoding for the input image, We can learn the parameters by applying gradient descent on a triplet loss function also known as contrastive loss. In other words, we will calculate a loss function using three images i.e. a positive, anchor, and a negative image. Here the anchor image and the positive image are the same and negative is a different image altogether.
As the positive and anchor image are the same, the distance d(A, P) between their encodings will be less than or equal to the distance d(A, N) between the encoding of the negative image and anchor.
d(f(A),f(P)) should be small
d(f(A),f(N)) should be large
d(f(A), f(P)) < d(f(A),f(N))
The loss function,
L(A,P,N) = max(|| f(A) – f(P) ||2 – || f(A) – f(N) ||2 + alpha, 0)
The max here means as long as [d(A, P) – d(A, N) + alpha] is less than or equal to zero, the triplet loss function L(A, P, N) will be zero, but if it is greater than zero, the loss will be positive and the function will try to minimize it to zero.
And finally the cost function(J), which is the sum of all the individual losses obtained from different triplets from all the training set is calculated
J = ∑ L(A(i), P(i), N(i))
Implementing face recognition using Siamese Neural Network
Description:
At LatentView, We have developed and tested the Siamese neural network and measured its performance in recognizing different images of individuals. We had implemented the Siamese Neural Network in two different ways.
Face Verification
We built a **Face verification** system that gives access to the list of people who live or work there. For example, to enter office premises, each person has to swipe their ID card (identification card) to identify themselves at the entrance. The face verification system then checks that they are who they claim to be.
Face Recognition
We have also implemented a face recognition system that takes an image as an input, and figures out if it is one of the authorized persons (and if so, who). Unlike the previous face verification system, we will no longer get a person’s name as one of the inputs.
Data Set Used :
We tested a typical use case of face verification and face recognition. Where we trained the Siamese Neural Network with single images of characters in the famous American TV show “Friends’ and later tried to verify the same by feeding different images of the characters. Lets see how it went!!
Feeding Input images to the model:
We had given the above images as inputs to our model for it to extract features.
Architecture:
First, we import images to the memory, and then after preprocessing the images we fed the images to the model. To build this model we had used inception network
Testing:
Once the model was done processing these single input images of Chandler, Ross, and Joey. We fed different sets of images to check if our model can verify who is the person in the image.
Test1: Chandler
Test 2: Ross
Test 3: Joey
Our model was able to verify who these characters were, just by looking at it once. We also tested dis-similar images just to check if our model will recognize this image
Test 4: Gaurav
Well, that says it all, Gaurav was not a character in the TV show Friends!.
With the above example, we could establish that the Siamese neural network with one-shot learning helps enable the power of AI to be near human ability and the use of this technique cannot be undermined as it has already found its implementation on social media platforms such as Facebook(Facenet).
Apart from Social media platforms. Facial recognition has found its use in various other fields as mentioned above that gives scope to optimize facial recognition using Siamese neural networks. Face recognition with one-shot learning, when adopted by business can not only help businesses but also makes life easy for common people
Conclusion
Siamese neural network is a state of the art algorithm that is quick and efficient in identifying dissimilarities in images. Algorithms nature to learn key features quickly and effectively differentiate between these features opens up new opportunities for its implementation especially in the field of video surveillance, facial recognition, self-learning tasks, etc. During the Implementation of the Siamese neural network, we found that it is computationally very expensive and time consuming i.e. training the model would require both time and hardware resources. Also, the very fundamental drawback is that it tries to fit each face into one of the given identities /images. If a new face appears on the screen, the system will assign it one identity or the other. This problem can be resolved by carefully picking a threshold value so that the similarities are identified better.