ResNet–U-Net
Based Convolutional Neural Network for Detecting Deforestation Caused by Oil
Palm Plantation Expansion Using Satellite Imagery
Muh.
Yamin1
¹Department
of Informatics Engineering, Faculty of Engineering, Halu Oleo University,
Kendari, Indonesia
Corresponding
author:
Muh Yamin
Email: muh_yamin@uho.ac.id
Abstract
Deforestation caused by
oil palm plantation expansion has become one of the most significant
environmental issues in tropical countries, particularly Indonesia. Continuous
land conversion threatens biodiversity, accelerates carbon emissions, and
reduces forest ecosystem sustainability. Conventional monitoring methods are
often constrained by limited spatial coverage, high operational costs, and
time-consuming field surveys. This study proposes a deep learning framework
that integrates ResNet-50 and U-Net architectures within a Convolutional Neural
Network (CNN) for semantic segmentation of satellite imagery to detect
deforestation caused by oil palm expansion. The proposed framework employs
ResNet-50 as the encoder for extracting high-level spatial features, while
U-Net performs pixel-level segmentation through an encoder-decoder architecture
with skip connections. Satellite images were preprocessed through annotation,
normalization, augmentation, and dataset partitioning before model training.
Performance evaluation was conducted using Precision, Recall, F1-score,
Confusion Matrix, and Mean Intersection over Union (mIoU). Experimental results
demonstrate that the proposed model achieved an average F1-score of 0.8221 and
an mIoU of 0.6894. The oil palm plantation class produced the highest
segmentation accuracy among all land-cover classes. These findings indicate
that integrating ResNet-50 and U-Net significantly improves semantic
segmentation performance for deforestation monitoring using satellite imagery.
The proposed framework can support environmental agencies and policymakers by
providing accurate, automated, and scalable land-cover monitoring for
sustainable forest management.
Keywords: Deep Learning, Convolutional
Neural Network, ResNet-50, U-Net, Deforestation Detection, Satellite Imagery,
Semantic Segmentation.
INTRODUCTION
Indonesia possesses one
of the world's largest tropical forest ecosystems, providing essential
ecological services including biodiversity conservation, climate regulation,
carbon sequestration, and water resource protection. However, rapid expansion
of oil palm plantations has substantially contributed to forest degradation
over recent decades. Large-scale conversion of natural forests into
agricultural land has accelerated deforestation rates and generated
considerable environmental challenges including biodiversity loss, ecosystem
fragmentation, greenhouse gas emissions, and increased vulnerability to natural
disasters.
Oil palm cultivation
represents one of Indonesia's most valuable agricultural commodities,
contributing significantly to national economic development. Nevertheless, its
rapid expansion has created conflicts between economic growth and environmental
sustainability. Previous reports indicate that a considerable proportion of
forest loss in Indonesia is directly associated with land conversion for oil
palm plantations, emphasizing the need for efficient monitoring systems capable
of identifying land-cover changes over large geographical regions.
Traditional approaches
for monitoring deforestation rely heavily on field observations and manual
interpretation of satellite imagery. Although these methods can produce
reliable results, they require substantial human expertise, financial
resources, and processing time. Moreover, manual interpretation becomes
increasingly impractical when dealing with multi-temporal satellite datasets
covering extensive forest landscapes.
Recent advances in deep
learning have significantly improved automatic image interpretation,
particularly in remote sensing applications. Convolutional Neural Networks
(CNNs) have demonstrated remarkable performance in object detection, image
classification, and semantic segmentation. Compared with conventional machine
learning algorithms, CNN-based models automatically learn hierarchical image
representations from raw data, eliminating the need for handcrafted feature
extraction.
Among various semantic
segmentation architectures, U-Net has become one of the most widely adopted
models due to its encoder-decoder structure and skip connections that preserve
spatial information during image reconstruction. However, the standard U-Net
architecture exhibits limitations in extracting complex semantic features from
high-resolution satellite imagery. Recent studies have shown that integrating
residual learning through ResNet significantly enhances feature representation
while mitigating degradation problems in deeper neural networks.
Several previous
studies have successfully applied CNN-based approaches for land-cover
classification and deforestation mapping. Random Forest and Support Vector
Machine have achieved satisfactory performance in certain applications but
remain limited in capturing complex spatial patterns. Likewise, standalone CNN
or U-Net models often experience difficulties distinguishing heterogeneous
land-cover boundaries in tropical landscapes. Combining ResNet with U-Net has
demonstrated promising improvements in remote sensing segmentation tasks,
although its application to detecting deforestation driven by oil palm
expansion in Indonesia remains relatively limited.
This research therefore
proposes a semantic segmentation framework that combines ResNet-50 and U-Net
within a CNN architecture for automated detection of deforestation caused by
oil palm plantation expansion using multi-class satellite imagery. The proposed
framework classifies land cover into forest, oil palm plantation, open land,
and background classes. Model performance is evaluated using Precision, Recall,
F1-score, Confusion Matrix, and Mean Intersection over Union (mIoU). The
proposed approach aims to provide an accurate, scalable, and efficient
monitoring system capable of supporting sustainable forest management and
environmental policy development.
The main contributions
of this study are summarized as follows:
- Development of a ResNet–U-Net semantic
segmentation model for satellite-based deforestation detection.
- Integration of deep feature extraction and
pixel-wise segmentation for multi-class land-cover mapping.
- Comprehensive evaluation using multiple
segmentation metrics.
- Demonstration of the applicability of the
proposed framework for automated monitoring of oil palm-driven
deforestation in Indonesia.
Related Works
Automatic land-cover
classification using satellite imagery has experienced rapid development with
the emergence of deep learning techniques. Compared with conventional machine
learning algorithms, deep learning enables hierarchical feature extraction directly
from image pixels, resulting in higher accuracy for complex remote sensing
applications.
Several studies have
investigated deforestation detection using machine learning approaches.
Iskandar and Hanafi (2022) applied Random Forest to detect tropical forest loss
and achieved an overall accuracy exceeding 95%. Although Random Forest
demonstrated strong classification capability, its performance relied heavily
on handcrafted features and experienced limitations in representing complex
spatial patterns.
CNN-based approaches
have recently become the dominant solution for semantic segmentation of
satellite imagery. Magdalena et al. (2021) employed CNN for multi-class
land-cover classification using SPOT-6 imagery and reported an accuracy of
95.45%, demonstrating the superiority of convolutional feature extraction over
traditional image processing techniques.
The U-Net architecture
has become one of the most influential semantic segmentation models. Originally
designed for biomedical image segmentation, U-Net has been successfully adapted
to remote sensing applications due to its encoder-decoder architecture and skip
connections, which preserve fine spatial information throughout the
segmentation process.
However, several
researchers reported that the conventional U-Net architecture suffers from
limited feature representation when processing highly heterogeneous satellite
imagery. Deep semantic information extracted by shallow encoders often becomes
insufficient for distinguishing visually similar land-cover classes such as
secondary forests, plantations, and open land.
Residual learning
introduced through ResNet effectively addresses this limitation by allowing
significantly deeper neural networks while avoiding gradient degradation.
ResNet enables robust extraction of high-level semantic features without
substantially increasing optimization complexity.
Recent studies
combining ResNet and U-Net have shown promising improvements in semantic
segmentation. Vasavi et al. (2023) proposed an ensemble ResNet-U-Net
architecture for building extraction from very high-resolution satellite
imagery and demonstrated improved segmentation performance compared with
conventional CNN models. Similar improvements have also been observed in
vegetation mapping and geological interpretation.
Despite these advances,
applications of integrated ResNet-U-Net architectures for detecting oil
palm-induced deforestation in Indonesia remain limited. Most previous studies
focused either on land-cover classification or general change detection without
integrating semantic segmentation and temporal analysis within a single
framework.
Therefore, this
research contributes by developing a ResNet-50-based U-Net architecture
specifically designed for semantic segmentation of tropical forest conversion
associated with oil palm plantation expansion using multi-class satellite
imagery.
MATERIALS AND METHODS
The proposed framework
was evaluated using satellite imagery collected from Kolaka Regency, Southeast
Sulawesi, Indonesia, an area experiencing continuous expansion of oil palm
plantations during the last decade. Multi-temporal satellite imagery covering
the period 2009–2024 was employed to analyze land-cover dynamics and
deforestation trends.
Dataset Preparation
Satellite images were
manually annotated into four semantic classes: forest, oil palm plantation, open
land, and background.
Image annotation was
performed using Roboflow, producing corresponding segmentation masks for
supervised learning.
Prior to training, all
images underwent preprocessing consisting of: image resizing, pixel normalization,
data augmentation, and dataset partitioning.
The dataset was divided
into training, validation, and testing subsets to prevent overfitting and
evaluate model generalization.
Proposed CNN Architecture
The proposed model
integrates ResNet-50 as the encoder and U-Net as the decoder.
Encoder: ResNet-50
extracts hierarchical semantic features using residual learning.
Residual blocks
facilitate deeper representation learning while mitigating vanishing gradient
problems.
Decoder: The decoder
reconstructs pixel-level segmentation maps through successive upsampling
operations.
Skip connections
transfer low-level spatial information from encoder layers to decoder layers,
improving boundary localization.
The network predicts
four semantic classes using a Softmax activation function. Categorical
Cross-Entropy was adopted as the optimization loss function.
RESULT AND DISCUSSION
Model Training
Training was conducted
using TensorFlow and Keras.
Important
configurations included: optimizer : adam, loss : categorical cross-entropy, activation
: ReLU, and output : softmax.
Early stopping and model checkpoint callbacks
were applied to reduce overfitting.
Fine-tuning was
subsequently performed by unfreezing deeper ResNet layers to improve feature
adaptation for satellite imagery.
Performance Evaluation
The performance of the
proposed segmentation model was evaluated using several standard semantic
segmentation metrics, namely Precision, Recall, F1-score, Intersection over
Union (IoU), and Mean Intersection over Union (mIoU). These metrics are defined
as follows.
Model performance was
evaluated using several segmentation metrics:
Precision measures the
proportion of correctly predicted positive pixels among all pixels predicted as
positive.
Recall measures the
proportion of correctly predicted positive pixels among all actual positive
pixels.
The F1-score is the
harmonic mean of Precision and Recall, providing a balanced measure of both
metrics.
The Mean Intersection
over Union (mIoU) is computed as the average IoU across all semantic classes.
In addition to the
quantitative metrics above, a Confusion Matrix was employed to analyze
class-wise classification performance. The confusion matrix provides detailed
information on correctly and incorrectly classified pixels for each land-cover
class, allowing identification of confusion patterns among classes.
Experimental Results
The proposed
ResNet-U-Net framework demonstrated stable convergence during model training.
Training and validation
losses consistently decreased, indicating effective optimization and
satisfactory generalization capability.
The classification
report yielded:
The oil palm plantation
class achieved the highest segmentation performance due to its relatively
homogeneous spectral characteristics compared with forest and open land
classes.
Confusion Matrix
analysis indicated that most prediction errors occurred between forest and open
land boundaries, which exhibit similar visual textures in medium-resolution
satellite imagery.
The semantic
segmentation results showed accurate delineation of plantation boundaries while
preserving forest structures through pixel-level classification.
Qualitative
visualization further demonstrated that the proposed model effectively
distinguished fragmented forest regions from expanding oil palm plantations.
The experimental
results demonstrate that integrating ResNet-50 with U-Net substantially
improves semantic segmentation performance for deforestation detection.
ResNet effectively
extracts deep semantic representations from complex tropical landscapes, while
U-Net reconstructs high-resolution segmentation masks through skip connections.
Compared with
conventional CNN architectures reported in previous literature, the proposed
framework provides better preservation of spatial details, particularly along
irregular forest boundaries.
The obtained F1-score
of 0.8221 indicates reliable classification performance across multiple
land-cover categories.
Similarly, the mIoU
value of 0.6894 confirms accurate overlap between predicted segmentation masks
and manually annotated ground truth. Although the proposed model performed
well, several limitations remain.
First, segmentation
accuracy decreases near transition zones where forest gradually converts into
young oil palm plantations.
Second, atmospheric
conditions, cloud cover, and seasonal variation influence spectral consistency
within satellite imagery.
Third, annotation
quality significantly affects supervised learning performance.
Future research should
investigate attention mechanisms such as Attention U-Net, Transformer-based
segmentation networks, or DeepLabV3+ to further enhance contextual
representation.
Integration with
Sentinel-2 multispectral imagery and LiDAR data may also improve discrimination
between vegetation types.
CONCLUSION
This study proposed a
semantic segmentation framework integrating ResNet-50 and U-Net for detecting
deforestation caused by oil palm plantation expansion using satellite imagery.
Experimental evaluation
demonstrated that the proposed architecture successfully classified four
land-cover categories and achieved an F1-score of 0.8221 with an mIoU of
0.6894, indicating reliable segmentation performance.
The proposed framework
provides an efficient and scalable solution for automated land-cover
monitoring, supporting environmental agencies in identifying forest conversion
and improving sustainable forest management.
Future work will focus
on incorporating multi-temporal deep learning models, multispectral satellite
data, and explainable artificial intelligence techniques to enhance prediction
accuracy and model interpretability.
REFERENCES
1.
Abadi,
M., Agarwal, A., Barham, P., et al. (2016). TensorFlow: Large-scale machine
learning on heterogeneous systems. Software available from https://tensorflow.org.
2. Audebert, N., Le Saux, B., &
Lefèvre, S.. (2018). Beyond RGB: Very high-resolution urban remote sensing with
multimodal deep networks. ISPRS Journal of Photogrammetry and Remote
Sensing, 140, 20–32.
3.
Badrinarayanan, V., Kendall, A., & Cipolla, R.. (2017). SegNet: A deep
convolutional encoder-decoder architecture for image segmentation. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 39(12),
2481–2495.
- Bishop, C. M. (2006). Pattern Recognition and
Machine Learning. Springer.
- He, K., Zhang, X., Ren, S., & Sun, J. (2016).
Deep residual learning for image recognition. Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, 770–778.
- Iskandar, & Hanafi. (2022). Machine learning
algorithm for tropical forest deforestation detection.
- Li, W.,
Fu, H., Yu, L., & Cracknell, A. (2017). Deep learning based oil palm tree detection. Remote
Sensing, 9(1), 22.
- Long, J., Shelhamer, E., & Darrell, T.
(2015). Fully convolutional networks for semantic segmentation. CVPR,
3431–3440.
9.
Ma, L., Liu, Y., et al. (2019). Deep learning in remote sensing applications:
A meta-analysis. ISPRS Journal of Photogrammetry and Remote Sensing, 152,
166–177.
10. Minaee, S., Boykov, Y., Porikli,
F., et al. (2021). Image segmentation using deep learning: A survey. IEEE
Transactions on Pattern Analysis and Machine Intelligence.
11. Paszke, A., et al. (2019). PyTorch:
An imperative style, high-performance deep learning library. NeurIPS,
8026–8037.
- Ronneberger, O., Fischer, P., & Brox, T.
(2015). U-Net: Convolutional networks for biomedical image segmentation.
MICCAI, 234–241.
- Thapa, R., et al. (2023). Deep learning
applications for remote sensing image segmentation. Remote Sensing.
- Ulmas, P., & Liiv, I. (2020). Segmentation of
Satellite Imagery Using U-Net Models for Land Cover Classification.
- Vasavi, S., et al. (2023). Classification of
Buildings from VHR Satellite Images Using Ensemble of U-Net and ResNet.
16. Zhu, X. X., et al. (2017). Deep
learning in remote sensing: A comprehensive review. IEEE Geoscience and
Remote Sensing Magazine, 5(4), 8–36.
17. Google Earth. (2024). Google
Earth Pro User Guide.