2.1 Data preparation
2.2 Convolutional neural network for rock type classification
In recent years, the effectiveness of CNN has been demonstrated in a bunch of different tasks such as image classification, object detection and image recognition. In the field of Earth’s science, CNN have been successfully applied with considerable success, notably as regards remote sensing image classification and segmentation, seismic data interpretation and mineral and lithology recognition based on microscope image (Baykan and Yilmaz, 2010, Jiang, 2017, Zhang et al., 2019). Since rock is a basic component of Earth, the classification of rock plays an important role to understand Earth system. Several attempts have been made to deal with automatic rock type classification, mostly based on many different feature selection algorithms such as principal component analyses and by genetic algorithms. Chatterjee (2008), Chatterjee et al. (2013) and Patel et al. (2016, 2017) proposed methods for classifying rock types using features extracting from original rock images. Those selected features was then reduced to smaller dimensional space before placing into a multiclass support vector machine for rock types classification using one-versus-all approach. Similarly, Guo and Liu (2014) used image processing techniques to extract features from rock images and then feed to neural network for rock type identification. More recently, by taking advantages of transfer learning with GoogLeNet Inception V3 CNN architecture, Zhang et al. (2018) achieved an overall accuracy of 85% on classifying granite, phyllite and breccia rock. Wang et al. (2018) used MobileNets, a lightweight CNN compression model, to identify rock images in the work field with an accuracy of 95%. Ran et al. (2019) proposed a new CNN architecture named RTCNN, which can identify six common rock types with an overall classification accuracy of 97%. However, the common feature of these works is that the used datasets are from the same distribution. Therefore, the overall accuracy was quite high ranging from 85 to more than 95% in the same data distribution, but lacking of generalization performance. Furthermore, the number of rock class is limited, which may not meet the need of rock classification in reality.
In this study, on the basis of recent breakthrough in Deep CNN we generate a deep learning based classification processes that assist geologist in identifying rock type more efficiently and accurately by using rock images taking during geological survey and also provide a helpful tool for student and junior geologists in practicing rock type classification. Usually, rock samples are collected during field survey, which is then analyzed at laboratory for lithological classification. It definitely is a time-consuming and costly process. Furthermore, preliminary identification of rock type in the field is necessary to sketch the geological map and basically interpret the structure as well as geological history of an area. Therefore, a mobile system, which could automatically and accurately identify rock type, would bring multiple benefits to geologist. In general, rocks are classified based on its own physical features such as color, texture, grain size, etc. (Anderson, 1996). Thus, it is obvious that a CNN can be trained to recognize those physical features and used to classify rocks. In this study, a CNN architecture, named ResNet, is selected because of its exemplary performance in image classification (He et al., 2015). As a result of the great depth with skip connection technique, ResNet is not only capable of approximating complex nonlinear function, but also solving the problem of vanishing gradient. In the following, the dataset and structure of ResNet-50 used in this study will be discussed in great detail.
2.1 Data preparation
The dataset is composed of more than 1000 images in 10 different classes of rock. Each class contains roughly 100 RGB images with different size, which are then resized to 256 × 256 × 3 pixels for model input (Fig. 1). Those expert-labelled samples were gathered from different open sources including university archives and special collections. Unlike previous related works taking well-controlled images as input (Pires de Lima et al., 2019, Shu et al., 2017), the dataset used in this study display a high variability as consisting of images with different resolutions, backgrounds and lighting conditions (indoor, outdoor, solid background, etc.). In other words, the images were collected without any standardizations, which may result in the lower overall accuracy but the higher generalization ability of the network comparing to other works. To train and evaluate the model, we randomly split the dataset into portions of 70%, 20% and 10% for training, validation and test set, respectively. The validation set is crucial for tuning model’s hyperparameters as giving an idea of how the model behaves on unseen samples during the training process. The correlation between training loss and validation loss would indicate if the number of epoch is sufficient or not and also whether the overfitting occurs. Finally, the test set is there to assess how well the model generalizes into real unseen data.
In order to raise the generalization of the model and avoid overfitting, we perform data augumentation on training set including rotation, position and brighness shifting, horizontal and vertical flip, ect., while keep the validation and test set unchanged (Fig. 2). This technique allows us to create more diverse data for training the model without actually having to collect more data. Comparing to the performance of model without data augmentation, the accuracy of the model has been shown to have a significant improvement with data augumentation.
2.2 Convolutional neural network for rock type classification
The model used in this study is inspired by ResNet-50. The major breakthrough with ResNet is that it allows us to train an extremely deep neural networks (150 or more layers) successfully (Ng, 2019). Before ResNet, it is impossible to train very deep neural network due to the problem of vanishing gradient. The solution of ResNet is feeding the output of one layer to another layer by skipping several layers in between (Fig. 3).
The ResNet-50 model generally divided into 5 stages (Fig. 4). Except the first stage, all others are composed of a convolution and a identity block. Both of convolution and identity block has 3 convolution layers performing 3x3 and 1x1 convolution with a fixed feature map dimension known as bottleneck architecture. The only difference between those two is on the skip connection path. While the skip connection path within identity block simply passes the output from a layer to deeper layer, within convolution block it also performs a convolution operation to ensure dimensional matching up for later addition step. At the end of ResNet-50, a global average pooling layer and fully connected layers with softmax activation are added to make the output of the network in the shape of 1D array with size of 10 representing for 10 rock classes. Each unit of the output array returns the prediction probability that the input image belongs to a rock class. Therefore, the summation of all output units will be always equal to 1.0 (or 100%). The best prediction is corresponding to the output unit with the probability ideally equal to 1.0 or higher than the probability of the rest output units. In overall, the ResNet-50 has more than 25 million trainable parameters.
The model was trained for 200 epochs with batch size of 16 and 32. Our experiments show that batch size of 16 produces the better result on generalization performance, which is consistent with previous work (LeCun et al., 1998). The Adam optimizer was used with the learning rate decreasing as the training progresses. Accordingly, the initial learning rate is 0.001 and will then decrease by factor of 10 after epochs of 80, 120 and 150. Although the model is trained with 200 epochs, the best weights of model monitored by lowest validation loss and highest validation accuracy can be saved regardless of which epoch the best model was obtained. This ensures that if the best model was achieved in whichever epoch 150, 175, or even 100, that particular weights of model will be saved and further tested with the test set.
For evaluating the model, we generally use accuracy metric refered as overall accuracy in this study, which is simply a fraction of correct prediction over total prediction. However, accuracy metric alone cannot give us a comprehensive understanding of the model performance, especially with class-imbalanced dataset (Forman and Scholz, 2010). Several other metrics including f1-score, precision and recall are therefore also considered.
The overall accuracy and loss on the training and validation set are shown on following Fig. 5. The model achieves its best performance at 140th epoch with the overall accuracy of approximately 85% on validation set. We can see that both the training and validation loss starts to converge after more or less 150 epochs. At the same time, the training and validation overall accuracy tend to remain stable, i.e. the accuracy on the training set approaches roughly 93% and the accuracy on the validation set is around 85%. Any further training may provide little to zero boost in performance of model. The gap between training and validation accuracy could be intuitively explained by the fact that the dataset used in this study is quite small in comparation with the requirement of a deep neural network like ResNet-50. Furthermore, since rock features can be highly variable even in a same class, the training set may not reflect all features of samples in validation set. However, as long as the validation loss isn't going back up, the model has still done well on its job. For the larger amount of data, the gap between train and validation will be closed. This fact was actually proven during our experiments with the network. Fig. 5 clarifies the difference between training the network with dataset of 70 images per class and 100 images per class.
One of the most straightforward ways to check the accuracy of the model is to measure how many correct guesses it obtained on the test set. Test result shows that the model can get an overall accuracy of 84% across 116 trials. Besides, other evaluation metrics for classifier model including precision, recall and F1-score are also used to assess the performance of model. In general, the model predicts well on most of classes with the average of precision, recall and F1-score higher than 80% (Table 1). For the diorite, gabbro and gneiss rock, classification test result is slightly lower than the others. Regarding gabbro rock, the precision and recall obtained by the model are 77% and 71%, respectively, which means that 77% of rocks that were classified as gabbro are actually gabbro and the remains 23% were mistaken for gabbro rock; also 71% of all gabbro rock in test set were correctly classified, i.e., 10 images out of 14 test images. Several prediction examples and predicted probability are shown in Fig. 6.
The feature maps that result from applying filters to input images shows as follows, which could provide insight into the internal representation that the model has of a specific input at a given point in the model. At the first convolution layers, the activations retain almost all of the information present in the original image (Fig. 7). As go deeper, the outputs of the layers become increasingly abstract and less visually interpretable, but giving more information on the class of the image (Fig. 8).
The confusion matrix is a useful tool to visualize the accuracy of a classification model by simply comparing the true label and predicted label (Fig. 9). By examining the confusion matrix, we can inspect where a particular misclassification has been made.
In general, similar rocks such as “Fine-Grained Basalt” and “Gabbro” or “Diorite” and “Granite” are easy to be confused, since those are almost the same even under naked eyes. Additionally, most of wrong predictions is low resolution image. It is important to note that rocks are classified by major physical properties captured on photography. Thus, if we are failed to capture those features or in other words, image resolution is not enough to resolve those features, it would therefore be impossible to distinguish between 2 look-alike rock types. Furthermore, one of the major problem is that the number of sample are few and cannot meet the needs of generalization performance. Even though, with a same type of rock the texture, color and grain size can be changed significantly. Therefore, the expansion of dataset may be required, so that CNN can be able to reasonably capture the complexity in rock classification.
In this study, we have shown that CNN can be successfully applied to classify rock types using images with the overall accuracy of 84% across 116 trials of test set. Additionally, this CNN model perform well on images with different background and also with some interferences such as coin, scale bar or text overlap on the rock. However, there still remain unfulfilled needs of rock classification in the field, partly because of the model accuracy and the limitation of classes. Thus, expanding to more rock types should be included in the future work. Besides, in order to further improve the accuracy of the model, we will focus on tuning the model architecture to achieve maximum performance and also enhance the size and quality of the dataset. It’s obvious that as diverse images of rock samples from fields are further collected, the applicable cases should be wider and accuracy is also greatly enhanced automatically. And the trained deep learning model can be further developed to a mobile application for assisting geologist in classifying rocks in fieldwork.