Credit Card Type Detection with Machine Learning
In this article, we will use a machine learning algorithm called Support Vector Machine (SVM) to predict whether a credit card is a Visa, MasterCard or UnionPay card, trained using credit card images.
We will be using Python with scikit-learn library for the analysis. Source code for this article is available in Github.
We will be using credit card images to train our model. The data has been prepared in here but I will also explain how I obtain the data.
To obtain credit card image, one way is to scrape google search result but we cannot gurantee every search result is a actually the image of a credit card with its card type logo.
I decided to scrape the data from a credit card comparison site called MoneyHero. In fact, you will discover that they have an api if you inspect Chrome console so we can get the data without much effort scraping the site. But this is beyond the scope of this article so I will not go into the details.
The code for scraping data can be found here.
After we get all the images, we also have to manually label the card type for each image. Some card types appear rarely so I will only deal with 3 types of cards: Visa, MasterCard and UnionPay. We got 127 images of them in total and we resize them into the same dimension (92x56).
Before feeding the data into our machine learning model, we have to do more preprocessing. First, we splited the data into 80:20 for training and testing as usually practice.
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(card_image_matrix, df.card_type, test_size=0.2, random_state=812)
At this point, we notice that we have 15456 features(width x height x RGB Channels) for each image but we only have ~100 training image, which may cause underfitting. Therefore, we have to come up with ways to either collecting more data or reducing redundant features.
We used a technique called Principal Component Analysis to reduce the dimension to 50, which already explained 95% of the variation across images becuase some information such as background is redundant or same across all the images.
from sklearn.decomposition import PCA pca = PCA(n_components=50) pca.fit(X_train) print(np.cumsum(pca.explained_variance_ratio_)) # We see that 50 components already explained 95% of variablility X_train_reduced = pca.transform(X_train) X_test_reduced = pca.transform(X_test)
Support Vector Machine
Now we can fit the data into our machine learning model called Support Vector Machine (SVM). Support Vector Machine is the simple and quick machine learning algorithm which aims to find the optimal linear seperable plane between different classes of data. You can easily find resource on how it works in Mathematics.
from sklearn.svm import SVC clf = SVC(kernel ='linear', max_iter=10000) clf.fit(X_train_reduced, y_train) clf.score(X_train_reduced, y_train) # It predict perfectly on training set! # 1 clf.score(X_test_reduced, y_test) # With good cv score. # 0.92307692307692313
The model learned perfectly from our training set. Meanwhile, it predicted 92.3% correctly on images it never seen before!
The most amazing thing is the model can predict the card type even it doesn’t know what is Visa, MasterCard and UnionPay and their logos! It just look for the similarity of the card of same card types.
In fact, the wrong one may due to super low resolution and unusual MasterCard color.
We have seen how we can teach our machine to recognize card type. However, please be aware that the support vector machine can predict with such a small training set is based on consistency of the card structure:
- Every logo is at bottom right
- The cards are not rotated
- The cards have the same size
Without these assumption, we may have to use more training data or some other powerful machine learning algorithm.
I intially planned to create a web app for you to test with your own card but I don’t want you to upload your credit card number so I stopped. :) But you can always fork my code and try on your own! Next time we will deal with more complex recognition task. Stay close!
Source code is available in Github.