Original Article

Tunnel and Underground Space. October 2020. 462-472





  •   2.1 Data preparation

  •   2.2 Convolutional neural network for rock type classification

  • 3. RESULT




In recent years, the effectiveness of CNN has been demonstrated in a bunch of different tasks such as image classification, object detection and image recognition. In the field of Earth’s science, CNN have been successfully applied with considerable success, notably as regards remote sensing image classification and segmentation, seismic data interpretation and mineral and lithology recognition based on microscope image (Baykan and Yilmaz, 2010, Jiang, 2017, Zhang et al., 2019). Since rock is a basic component of Earth, the classification of rock plays an important role to understand Earth system. Several attempts have been made to deal with automatic rock type classification, mostly based on many different feature selection algorithms such as principal component analyses and by genetic algorithms. Chatterjee (2008), Chatterjee et al. (2013) and Patel et al. (2016, 2017) proposed methods for classifying rock types using features extracting from original rock images. Those selected features was then reduced to smaller dimensional space before placing into a multiclass support vector machine for rock types classification using one-versus-all approach. Similarly, Guo and Liu (2014) used image processing techniques to extract features from rock images and then feed to neural network for rock type identification. More recently, by taking advantages of transfer learning with GoogLeNet Inception V3 CNN architecture, Zhang et al. (2018) achieved an overall accuracy of 85% on classifying granite, phyllite and breccia rock. Wang et al. (2018) used MobileNets, a lightweight CNN compression model, to identify rock images in the work field with an accuracy of 95%. Ran et al. (2019) proposed a new CNN architecture named RTCNN, which can identify six common rock types with an overall classification accuracy of 97%. However, the common feature of these works is that the used datasets are from the same distribution. Therefore, the overall accuracy was quite high ranging from 85 to more than 95% in the same data distribution, but lacking of generalization performance. Furthermore, the number of rock class is limited, which may not meet the need of rock classification in reality.

In this study, on the basis of recent breakthrough in Deep CNN we generate a deep learning based classification processes that assist geologist in identifying rock type more efficiently and accurately by using rock images taking during geological survey and also provide a helpful tool for student and junior geologists in practicing rock type classification. Usually, rock samples are collected during field survey, which is then analyzed at laboratory for lithological classification. It definitely is a time-consuming and costly process. Furthermore, preliminary identification of rock type in the field is necessary to sketch the geological map and basically interpret the structure as well as geological history of an area. Therefore, a mobile system, which could automatically and accurately identify rock type, would bring multiple benefits to geologist. In general, rocks are classified based on its own physical features such as color, texture, grain size, etc. (Anderson, 1996). Thus, it is obvious that a CNN can be trained to recognize those physical features and used to classify rocks. In this study, a CNN architecture, named ResNet, is selected because of its exemplary performance in image classification (He et al., 2015). As a result of the great depth with skip connection technique, ResNet is not only capable of approximating complex nonlinear function, but also solving the problem of vanishing gradient. In the following, the dataset and structure of ResNet-50 used in this study will be discussed in great detail.


2.1 Data preparation

The dataset is composed of more than 1000 images in 10 different classes of rock. Each class contains roughly 100 RGB images with different size, which are then resized to 256 × 256 × 3 pixels for model input (Fig. 1). Those expert-labelled samples were gathered from different open sources including university archives and special collections. Unlike previous related works taking well-controlled images as input (Pires de Lima et al., 2019, Shu et al., 2017), the dataset used in this study display a high variability as consisting of images with different resolutions, backgrounds and lighting conditions (indoor, outdoor, solid background, etc.). In other words, the images were collected without any standardizations, which may result in the lower overall accuracy but the higher generalization ability of the network comparing to other works. To train and evaluate the model, we randomly split the dataset into portions of 70%, 20% and 10% for training, validation and test set, respectively. The validation set is crucial for tuning model’s hyperparameters as giving an idea of how the model behaves on unseen samples during the training process. The correlation between training loss and validation loss would indicate if the number of epoch is sufficient or not and also whether the overfitting occurs. Finally, the test set is there to assess how well the model generalizes into real unseen data.

Fig. 1.

Examples of the data used in this study

In order to raise the generalization of the model and avoid overfitting, we perform data augumentation on training set including rotation, position and brighness shifting, horizontal and vertical flip, ect., while keep the validation and test set unchanged (Fig. 2). This technique allows us to create more diverse data for training the model without actually having to collect more data. Comparing to the performance of model without data augmentation, the accuracy of the model has been shown to have a significant improvement with data augumentation.

Fig. 2.

Example of data augumentation on granite rock image

2.2 Convolutional neural network for rock type classification

The model used in this study is inspired by ResNet-50. The major breakthrough with ResNet is that it allows us to train an extremely deep neural networks (150 or more layers) successfully (Ng, 2019). Before ResNet, it is impossible to train very deep neural network due to the problem of vanishing gradient. The solution of ResNet is feeding the output of one layer to another layer by skipping several layers in between (Fig. 3).

Fig. 3.

Skip connection: (a) Identity block, (b) Convolutional block

The ResNet-50 model generally divided into 5 stages (Fig. 4). Except the first stage, all others are composed of a convolution and a identity block. Both of convolution and identity block has 3 convolution layers performing 3x3 and 1x1 convolution with a fixed feature map dimension known as bottleneck architecture. The only difference between those two is on the skip connection path. While the skip connection path within identity block simply passes the output from a layer to deeper layer, within convolution block it also performs a convolution operation to ensure dimensional matching up for later addition step. At the end of ResNet-50, a global average pooling layer and fully connected layers with softmax activation are added to make the output of the network in the shape of 1D array with size of 10 representing for 10 rock classes. Each unit of the output array returns the prediction probability that the input image belongs to a rock class. Therefore, the summation of all output units will be always equal to 1.0 (or 100%). The best prediction is corresponding to the output unit with the probability ideally equal to 1.0 or higher than the probability of the rest output units. In overall, the ResNet-50 has more than 25 million trainable parameters.

Fig. 4.

ResNet-50 architecture

The model was trained for 200 epochs with batch size of 16 and 32. Our experiments show that batch size of 16 produces the better result on generalization performance, which is consistent with previous work (LeCun et al., 1998). The Adam optimizer was used with the learning rate decreasing as the training progresses. Accordingly, the initial learning rate is 0.001 and will then decrease by factor of 10 after epochs of 80, 120 and 150. Although the model is trained with 200 epochs, the best weights of model monitored by lowest validation loss and highest validation accuracy can be saved regardless of which epoch the best model was obtained. This ensures that if the best model was achieved in whichever epoch 150, 175, or even 100, that particular weights of model will be saved and further tested with the test set.

For evaluating the model, we generally use accuracy metric refered as overall accuracy in this study, which is simply a fraction of correct prediction over total prediction. However, accuracy metric alone cannot give us a comprehensive understanding of the model performance, especially with class-imbalanced dataset (Forman and Scholz, 2010). Several other metrics including f1-score, precision and recall are therefore also considered.


The overall accuracy and loss on the training and validation set are shown on following Fig. 5. The model achieves its best performance at 140th epoch with the overall accuracy of approximately 85% on validation set. We can see that both the training and validation loss starts to converge after more or less 150 epochs. At the same time, the training and validation overall accuracy tend to remain stable, i.e. the accuracy on the training set approaches roughly 93% and the accuracy on the validation set is around 85%. Any further training may provide little to zero boost in performance of model. The gap between training and validation accuracy could be intuitively explained by the fact that the dataset used in this study is quite small in comparation with the requirement of a deep neural network like ResNet-50. Furthermore, since rock features can be highly variable even in a same class, the training set may not reflect all features of samples in validation set. However, as long as the validation loss isn't going back up, the model has still done well on its job. For the larger amount of data, the gap between train and validation will be closed. This fact was actually proven during our experiments with the network. Fig. 5 clarifies the difference between training the network with dataset of 70 images per class and 100 images per class.

Fig. 5.

Learning curve: (a) loss and (b) overall accuracy

One of the most straightforward ways to check the accuracy of the model is to measure how many correct guesses it obtained on the test set. Test result shows that the model can get an overall accuracy of 84% across 116 trials. Besides, other evaluation metrics for classifier model including precision, recall and F1-score are also used to assess the performance of model. In general, the model predicts well on most of classes with the average of precision, recall and F1-score higher than 80% (Table 1). For the diorite, gabbro and gneiss rock, classification test result is slightly lower than the others. Regarding gabbro rock, the precision and recall obtained by the model are 77% and 71%, respectively, which means that 77% of rocks that were classified as gabbro are actually gabbro and the remains 23% were mistaken for gabbro rock; also 71% of all gabbro rock in test set were correctly classified, i.e., 10 images out of 14 test images. Several prediction examples and predicted probability are shown in Fig. 6.

Table 1.

Test result using the best weights of the model

Precision Recall F1-score Number of
test images
Fine-grained basalt 0.93 1.00 0.96 13
Flint 1.00 0.92 0.96 12
Diorite 1.00 0.76 0.87 17
Vesicular basalt 0.90 0.82 0.86 11
Granite 0.85 0.85 0.85 13
Conglomerate 0.78 0.88 0.82 8
Andesite 0.71 0.92 0.80 13
Gneiss 0.88 0.70 0.78 10
Sandstone 0.63 1.00 0.77 5
Gabbro 0.77 0.71 0.74 14
Average: 0.84 0.86 0.84
Fig. 6.

Example of test results

The feature maps that result from applying filters to input images shows as follows, which could provide insight into the internal representation that the model has of a specific input at a given point in the model. At the first convolution layers, the activations retain almost all of the information present in the original image (Fig. 7). As go deeper, the outputs of the layers become increasingly abstract and less visually interpretable, but giving more information on the class of the image (Fig. 8).

Fig. 7.

Example of feature maps in the 1st convolution layer

Fig. 8.

Example of feature maps in the 30th layer


The confusion matrix is a useful tool to visualize the accuracy of a classification model by simply comparing the true label and predicted label (Fig. 9). By examining the confusion matrix, we can inspect where a particular misclassification has been made.

Fig. 9.

Confusion matrix: (a) on validation set and (b) on test set

In general, similar rocks such as “Fine-Grained Basalt” and “Gabbro” or “Diorite” and “Granite” are easy to be confused, since those are almost the same even under naked eyes. Additionally, most of wrong predictions is low resolution image. It is important to note that rocks are classified by major physical properties captured on photography. Thus, if we are failed to capture those features or in other words, image resolution is not enough to resolve those features, it would therefore be impossible to distinguish between 2 look-alike rock types. Furthermore, one of the major problem is that the number of sample are few and cannot meet the needs of generalization performance. Even though, with a same type of rock the texture, color and grain size can be changed significantly. Therefore, the expansion of dataset may be required, so that CNN can be able to reasonably capture the complexity in rock classification.


In this study, we have shown that CNN can be successfully applied to classify rock types using images with the overall accuracy of 84% across 116 trials of test set. Additionally, this CNN model perform well on images with different background and also with some interferences such as coin, scale bar or text overlap on the rock. However, there still remain unfulfilled needs of rock classification in the field, partly because of the model accuracy and the limitation of classes. Thus, expanding to more rock types should be included in the future work. Besides, in order to further improve the accuracy of the model, we will focus on tuning the model architecture to achieve maximum performance and also enhance the size and quality of the dataset. It’s obvious that as diverse images of rock samples from fields are further collected, the applicable cases should be wider and accuracy is also greatly enhanced automatically. And the trained deep learning model can be further developed to a mobile application for assisting geologist in classifying rocks in fieldwork.


This research was supported by the research project “Development of environmental simulator and advanced construction technologies over TRL6 in extreme conditions” funded by KICT.


Anderson, Don L., 1996, Petrology: The Study of Igneous, Sedimentary and Metamorphic Rocks, American Scientist, 84, 398+.
Andrew, Ng., 2019, Machine Learning course.
Baykan, N.A. and Yilmaz N., 2010, Mineral identification using color spaces and artificial neural networks, Computers and Geosciences, 36, 91-97. 10.1016/j.cageo.2009.04.009
Chatterjee, S., 2013, Vision-based rock-type classification of limestone using multi-class support vector machine, Appl. Intell, 39, 14-27. 10.1007/s10489-012-0391-7
Chatterjee, S., Bhattacherjee, A., Samanta, B. and Pal, S.K., 2008, Rock-type classification of an iron ore deposit using digital image analysis technique, Int. J. Min. Miner. Eng, 1, 22. 10.1504/IJMME.2008.020455
Forman, G. and Scholz, M., 2010, Apples-to-apples in cross-validation studies: Pitfalls in classifier performance measurement, ACM Sigkdd Explor. Newsl, 12, 49-57. 10.1145/1882471.1882479
Guo, C. and Liu, Y., 2014, Recognition of rock images based on multiple color spaces, Sci. Technol. Eng, 14, 247-251 and 255.
He, K., Zhang, X., Ren, S., and Sun, J., 2015, Deep residual learning for image recognition, arXiv preprint arXiv: 1512.03385. 10.1109/CVPR.2016.9026180094
Jiang, Y., 2017, Detecting Geological Structures in Seismic Volumes Using Deep Convolutional Neural Networks, Master Thesis.
LeCun, Y., Bottou, L., Orr, G.B. and Müller, K.R., 1998, Efficient BackProp. In: Orr G.B., Müller KR. (eds) Neural Networks: Tricks of the Trade, Lecture Notes in Computer Science, 1524. 10.1007/3-540-49430-8_2
Patel, A.K., Chatterjee, S., Gorai, A.K., 2017, Development of online machine vision system using support vector regression (SVR) algorithm for grade prediction of iron ores, 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA). IEEE, 149-152. 10.23919/MVA.2017.7986823
Patel, A.K., Gorai, A.K., Chatterjee, S., 2016, Development of Machine Vision-based System for Iron Ore Grade Prediction using Gaussian Process Regression (GPR), Pattern Recognit. Inf. Process, 45-48.
Pires de Lima, R., Bonar, A., Coronado, D.D., Marfurt, K., and Nicholson, C, 2019, Deep convolutional neural networks as a geological image classification tool, The Sedimentary Record, 17(2), 4-9. 10.2110/sedred.2019.2.4
Ran, X., Xue, L., Zhang, Y., Liu, Z., Sang, X. and He, X., 2019, Rock Classification from Field Image Patches Analyzed Using a Deep Convolutional Neural Network, Mathematics, 7, 755. 10.3390/math7080755
Shu, L., McIsaac, K., Osinski, G.R. and Francis, R., 2017, Unsupervised feature learning for autonomous rock image classification, Computers and Geosciences, 106, 10-17. 10.1016/j.cageo.2017.05.010
Wang, C., Li, Y., Fan, G., Chen, F., and Wang, W., 2018, Quick recognition of rock images for mobile applications, J. Eng. Sci. Technol. Rev, 11, 111-117. 10.25103/jestr.114.14
Zhang, Y., Li, M. and Han, S., 2018, Automatic identification and classification in lithology based on deep learning in rock images, Acta Petrol. Sin, 34, 333-342.
Zhang, Y., Li, M., Han, S., Ren, Q., and Shi, J., 2019, Intelligent Identification for Rock-Mineral Microscopic Images Using Ensemble Machine Learning Algorithms, Sensors (Basel, Switzerland), 19(18), 3914. 10.3390/s1918391431514321PMC6767609
페이지 상단으로 이동하기