ResNet–U-Net Based Convolutional Neural Network for Detecting Deforestation Caused by Oil Palm Plantation Expansion Using Satellite Imagery

ResNet–U-Net Based Convolutional Neural Network for Detecting Deforestation Caused by Oil Palm Plantation Expansion Using Satellite Imagery

 

Muh. Yamin1

¹Department of Informatics Engineering, Faculty of Engineering, Halu Oleo University, Kendari, Indonesia

Corresponding author:

Muh Yamin

Email: muh_yamin@uho.ac.id

 

 

Abstract

Deforestation caused by oil palm plantation expansion has become one of the most significant environmental issues in tropical countries, particularly Indonesia. Continuous land conversion threatens biodiversity, accelerates carbon emissions, and reduces forest ecosystem sustainability. Conventional monitoring methods are often constrained by limited spatial coverage, high operational costs, and time-consuming field surveys. This study proposes a deep learning framework that integrates ResNet-50 and U-Net architectures within a Convolutional Neural Network (CNN) for semantic segmentation of satellite imagery to detect deforestation caused by oil palm expansion. The proposed framework employs ResNet-50 as the encoder for extracting high-level spatial features, while U-Net performs pixel-level segmentation through an encoder-decoder architecture with skip connections. Satellite images were preprocessed through annotation, normalization, augmentation, and dataset partitioning before model training. Performance evaluation was conducted using Precision, Recall, F1-score, Confusion Matrix, and Mean Intersection over Union (mIoU). Experimental results demonstrate that the proposed model achieved an average F1-score of 0.8221 and an mIoU of 0.6894. The oil palm plantation class produced the highest segmentation accuracy among all land-cover classes. These findings indicate that integrating ResNet-50 and U-Net significantly improves semantic segmentation performance for deforestation monitoring using satellite imagery. The proposed framework can support environmental agencies and policymakers by providing accurate, automated, and scalable land-cover monitoring for sustainable forest management.

Keywords: Deep Learning, Convolutional Neural Network, ResNet-50, U-Net, Deforestation Detection, Satellite Imagery, Semantic Segmentation.

 

INTRODUCTION

Indonesia possesses one of the world's largest tropical forest ecosystems, providing essential ecological services including biodiversity conservation, climate regulation, carbon sequestration, and water resource protection. However, rapid expansion of oil palm plantations has substantially contributed to forest degradation over recent decades. Large-scale conversion of natural forests into agricultural land has accelerated deforestation rates and generated considerable environmental challenges including biodiversity loss, ecosystem fragmentation, greenhouse gas emissions, and increased vulnerability to natural disasters.

Oil palm cultivation represents one of Indonesia's most valuable agricultural commodities, contributing significantly to national economic development. Nevertheless, its rapid expansion has created conflicts between economic growth and environmental sustainability. Previous reports indicate that a considerable proportion of forest loss in Indonesia is directly associated with land conversion for oil palm plantations, emphasizing the need for efficient monitoring systems capable of identifying land-cover changes over large geographical regions.

Traditional approaches for monitoring deforestation rely heavily on field observations and manual interpretation of satellite imagery. Although these methods can produce reliable results, they require substantial human expertise, financial resources, and processing time. Moreover, manual interpretation becomes increasingly impractical when dealing with multi-temporal satellite datasets covering extensive forest landscapes.

Recent advances in deep learning have significantly improved automatic image interpretation, particularly in remote sensing applications. Convolutional Neural Networks (CNNs) have demonstrated remarkable performance in object detection, image classification, and semantic segmentation. Compared with conventional machine learning algorithms, CNN-based models automatically learn hierarchical image representations from raw data, eliminating the need for handcrafted feature extraction.

Among various semantic segmentation architectures, U-Net has become one of the most widely adopted models due to its encoder-decoder structure and skip connections that preserve spatial information during image reconstruction. However, the standard U-Net architecture exhibits limitations in extracting complex semantic features from high-resolution satellite imagery. Recent studies have shown that integrating residual learning through ResNet significantly enhances feature representation while mitigating degradation problems in deeper neural networks.

Several previous studies have successfully applied CNN-based approaches for land-cover classification and deforestation mapping. Random Forest and Support Vector Machine have achieved satisfactory performance in certain applications but remain limited in capturing complex spatial patterns. Likewise, standalone CNN or U-Net models often experience difficulties distinguishing heterogeneous land-cover boundaries in tropical landscapes. Combining ResNet with U-Net has demonstrated promising improvements in remote sensing segmentation tasks, although its application to detecting deforestation driven by oil palm expansion in Indonesia remains relatively limited.

This research therefore proposes a semantic segmentation framework that combines ResNet-50 and U-Net within a CNN architecture for automated detection of deforestation caused by oil palm plantation expansion using multi-class satellite imagery. The proposed framework classifies land cover into forest, oil palm plantation, open land, and background classes. Model performance is evaluated using Precision, Recall, F1-score, Confusion Matrix, and Mean Intersection over Union (mIoU). The proposed approach aims to provide an accurate, scalable, and efficient monitoring system capable of supporting sustainable forest management and environmental policy development.

The main contributions of this study are summarized as follows:

  1. Development of a ResNet–U-Net semantic segmentation model for satellite-based deforestation detection.
  2. Integration of deep feature extraction and pixel-wise segmentation for multi-class land-cover mapping.
  3. Comprehensive evaluation using multiple segmentation metrics.
  4. Demonstration of the applicability of the proposed framework for automated monitoring of oil palm-driven deforestation in Indonesia.

 

Related Works

Automatic land-cover classification using satellite imagery has experienced rapid development with the emergence of deep learning techniques. Compared with conventional machine learning algorithms, deep learning enables hierarchical feature extraction directly from image pixels, resulting in higher accuracy for complex remote sensing applications.

Several studies have investigated deforestation detection using machine learning approaches. Iskandar and Hanafi (2022) applied Random Forest to detect tropical forest loss and achieved an overall accuracy exceeding 95%. Although Random Forest demonstrated strong classification capability, its performance relied heavily on handcrafted features and experienced limitations in representing complex spatial patterns.

CNN-based approaches have recently become the dominant solution for semantic segmentation of satellite imagery. Magdalena et al. (2021) employed CNN for multi-class land-cover classification using SPOT-6 imagery and reported an accuracy of 95.45%, demonstrating the superiority of convolutional feature extraction over traditional image processing techniques.

The U-Net architecture has become one of the most influential semantic segmentation models. Originally designed for biomedical image segmentation, U-Net has been successfully adapted to remote sensing applications due to its encoder-decoder architecture and skip connections, which preserve fine spatial information throughout the segmentation process.

However, several researchers reported that the conventional U-Net architecture suffers from limited feature representation when processing highly heterogeneous satellite imagery. Deep semantic information extracted by shallow encoders often becomes insufficient for distinguishing visually similar land-cover classes such as secondary forests, plantations, and open land.

Residual learning introduced through ResNet effectively addresses this limitation by allowing significantly deeper neural networks while avoiding gradient degradation. ResNet enables robust extraction of high-level semantic features without substantially increasing optimization complexity.

Recent studies combining ResNet and U-Net have shown promising improvements in semantic segmentation. Vasavi et al. (2023) proposed an ensemble ResNet-U-Net architecture for building extraction from very high-resolution satellite imagery and demonstrated improved segmentation performance compared with conventional CNN models. Similar improvements have also been observed in vegetation mapping and geological interpretation.

Despite these advances, applications of integrated ResNet-U-Net architectures for detecting oil palm-induced deforestation in Indonesia remain limited. Most previous studies focused either on land-cover classification or general change detection without integrating semantic segmentation and temporal analysis within a single framework.

Therefore, this research contributes by developing a ResNet-50-based U-Net architecture specifically designed for semantic segmentation of tropical forest conversion associated with oil palm plantation expansion using multi-class satellite imagery.

 

MATERIALS AND METHODS

The proposed framework was evaluated using satellite imagery collected from Kolaka Regency, Southeast Sulawesi, Indonesia, an area experiencing continuous expansion of oil palm plantations during the last decade. Multi-temporal satellite imagery covering the period 2009–2024 was employed to analyze land-cover dynamics and deforestation trends.

Dataset Preparation

Satellite images were manually annotated into four semantic classes: forest, oil palm plantation, open land, and background.

Image annotation was performed using Roboflow, producing corresponding segmentation masks for supervised learning.

Prior to training, all images underwent preprocessing consisting of: image resizing, pixel normalization, data augmentation, and dataset partitioning.

The dataset was divided into training, validation, and testing subsets to prevent overfitting and evaluate model generalization.

Proposed CNN Architecture

The proposed model integrates ResNet-50 as the encoder and U-Net as the decoder.

Encoder: ResNet-50 extracts hierarchical semantic features using residual learning.

Residual blocks facilitate deeper representation learning while mitigating vanishing gradient problems.

Decoder: The decoder reconstructs pixel-level segmentation maps through successive upsampling operations.

 

Skip connections transfer low-level spatial information from encoder layers to decoder layers, improving boundary localization.

The network predicts four semantic classes using a Softmax activation function. Categorical Cross-Entropy was adopted as the optimization loss function.

 

RESULT AND DISCUSSION

Model Training

Training was conducted using TensorFlow and Keras.

Important configurations included: optimizer : adam, loss : categorical cross-entropy, activation : ReLU, and output : softmax.

 Early stopping and model checkpoint callbacks were applied to reduce overfitting.

Fine-tuning was subsequently performed by unfreezing deeper ResNet layers to improve feature adaptation for satellite imagery.

Performance Evaluation

The performance of the proposed segmentation model was evaluated using several standard semantic segmentation metrics, namely Precision, Recall, F1-score, Intersection over Union (IoU), and Mean Intersection over Union (mIoU). These metrics are defined as follows.

Model performance was evaluated using several segmentation metrics:

Precision measures the proportion of correctly predicted positive pixels among all pixels predicted as positive.

Recall measures the proportion of correctly predicted positive pixels among all actual positive pixels.

The F1-score is the harmonic mean of Precision and Recall, providing a balanced measure of both metrics.

The Mean Intersection over Union (mIoU) is computed as the average IoU across all semantic classes.

 

In addition to the quantitative metrics above, a Confusion Matrix was employed to analyze class-wise classification performance. The confusion matrix provides detailed information on correctly and incorrectly classified pixels for each land-cover class, allowing identification of confusion patterns among classes.

Experimental Results

The proposed ResNet-U-Net framework demonstrated stable convergence during model training.

Training and validation losses consistently decreased, indicating effective optimization and satisfactory generalization capability.

The classification report yielded:

 

The oil palm plantation class achieved the highest segmentation performance due to its relatively homogeneous spectral characteristics compared with forest and open land classes.

Confusion Matrix analysis indicated that most prediction errors occurred between forest and open land boundaries, which exhibit similar visual textures in medium-resolution satellite imagery.

The semantic segmentation results showed accurate delineation of plantation boundaries while preserving forest structures through pixel-level classification.

Qualitative visualization further demonstrated that the proposed model effectively distinguished fragmented forest regions from expanding oil palm plantations.

The experimental results demonstrate that integrating ResNet-50 with U-Net substantially improves semantic segmentation performance for deforestation detection.

ResNet effectively extracts deep semantic representations from complex tropical landscapes, while U-Net reconstructs high-resolution segmentation masks through skip connections.

 

Compared with conventional CNN architectures reported in previous literature, the proposed framework provides better preservation of spatial details, particularly along irregular forest boundaries.

The obtained F1-score of 0.8221 indicates reliable classification performance across multiple land-cover categories.

Similarly, the mIoU value of 0.6894 confirms accurate overlap between predicted segmentation masks and manually annotated ground truth. Although the proposed model performed well, several limitations remain.

First, segmentation accuracy decreases near transition zones where forest gradually converts into young oil palm plantations.

Second, atmospheric conditions, cloud cover, and seasonal variation influence spectral consistency within satellite imagery.

Third, annotation quality significantly affects supervised learning performance.

Future research should investigate attention mechanisms such as Attention U-Net, Transformer-based segmentation networks, or DeepLabV3+ to further enhance contextual representation.

Integration with Sentinel-2 multispectral imagery and LiDAR data may also improve discrimination between vegetation types.

 

CONCLUSION

This study proposed a semantic segmentation framework integrating ResNet-50 and U-Net for detecting deforestation caused by oil palm plantation expansion using satellite imagery.

Experimental evaluation demonstrated that the proposed architecture successfully classified four land-cover categories and achieved an F1-score of 0.8221 with an mIoU of 0.6894, indicating reliable segmentation performance.

The proposed framework provides an efficient and scalable solution for automated land-cover monitoring, supporting environmental agencies in identifying forest conversion and improving sustainable forest management.

Future work will focus on incorporating multi-temporal deep learning models, multispectral satellite data, and explainable artificial intelligence techniques to enhance prediction accuracy and model interpretability.

 

 

REFERENCES

1.     Abadi, M., Agarwal, A., Barham, P., et al. (2016). TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from https://tensorflow.org.

2.     Audebert, N., Le Saux, B., & Lefèvre, S.. (2018). Beyond RGB: Very high-resolution urban remote sensing with multimodal deep networks. ISPRS Journal of Photogrammetry and Remote Sensing, 140, 20–32.

3.     Badrinarayanan, V., Kendall, A., & Cipolla, R.. (2017). SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481–2495.

  1. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
  2. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778.
  3. Iskandar, & Hanafi. (2022). Machine learning algorithm for tropical forest deforestation detection.
  4. Li, W., Fu, H., Yu, L., & Cracknell, A. (2017). Deep learning based oil palm tree detection. Remote Sensing, 9(1), 22.
  5. Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. CVPR, 3431–3440.

9.     Ma, L., Liu, Y., et al. (2019). Deep learning in remote sensing applications: A meta-analysis. ISPRS Journal of Photogrammetry and Remote Sensing, 152, 166–177.

10.  Minaee, S., Boykov, Y., Porikli, F., et al. (2021). Image segmentation using deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence.

11.  Paszke, A., et al. (2019). PyTorch: An imperative style, high-performance deep learning library. NeurIPS, 8026–8037.

  1. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. MICCAI, 234–241.
  2. Thapa, R., et al. (2023). Deep learning applications for remote sensing image segmentation. Remote Sensing.
  3. Ulmas, P., & Liiv, I. (2020). Segmentation of Satellite Imagery Using U-Net Models for Land Cover Classification.
  4. Vasavi, S., et al. (2023). Classification of Buildings from VHR Satellite Images Using Ensemble of U-Net and ResNet.

16.  Zhu, X. X., et al. (2017). Deep learning in remote sensing: A comprehensive review. IEEE Geoscience and Remote Sensing Magazine, 5(4), 8–36.

17.  Google Earth. (2024). Google Earth Pro User Guide.