Kvalifika KYC
Classification of Georgian documents, including ID cards, passports, and driving licenses, using AI technology. The REST interface is developed using Flask, while PyTorch is utilized for creating the image classification model. This project serves as a valuable tool for Kvalifika's KYC (Know Your Customer) services
Initial Set Up and Data Preparation
The foundation of this project is a Convolutional Neural Network (CNN) created using Keras, a high-level neural networks API. This network is designed to classify images into distinct categories. The images used for training and testing the model are preprocessed using the ImageDataGenerator class. They are normalized and augmented to diversify the dataset and reduce overfitting. A specific directory path is set up for both training and validation images. The image size is standardized to 224x224 pixels to maintain consistency in data input.
Building the Convolutional Neural Network
A sequential model is used to construct the CNN. The network consists of multiple layers including Conv2D, MaxPooling2D, Flatten, and Dense. Conv2D layers are convolution layers that will deal with the input images, which are seen as 2-dimensional matrices. Activation functions 'relu' and 'sigmoid' are used to add non-linearity to the network and determine the output of the neural network. The 'relu' function is used to return the input directly if it is positive, else, it will return zero. The 'sigmoid' function, on the other hand, is used in the output layer to predict the probability for each category. Dropout is also implemented to prevent overfitting by randomly ignoring some neurons during training.
Training the Model
The model is trained on the preprocessed dataset. The 'rmsprop' optimizer is used to adjust the learning rate, and 'sparse_categorical_crossentropy' is chosen as the loss function for this multi-class classification problem. The model is configured to save the best weights during training. The training data is fed to the model using a Python generator that generates batches of augmented image data. This data augmentation aims to expand the training dataset to improve the performance and ability of the model.
The model is trained over 200 epochs, and the model's performance is evaluated against a validation dataset during these training epochs. These configurations aim to improve the model's ability to generalize to unseen data.
Evaluating and Using the Model
Once the model training is complete, the model with the best weights is loaded for evaluation and prediction. Evaluation is done on the validation dataset, and the performance is printed out. Then, the trained model is used to predict the classes of new unseen images. The predict method of the model is used to predict the class of each image, and the class with the highest prediction probability is selected as the final class.