|
1.INTRODUCTIONIn aquaculture, excessive aquaculture density and improper cage layout will cause the deterioration of water quality, which is not conducive to sustainable development1. At the same time, relying solely on manual measurement consumes a lot of manpower and material resources. Therefore, the use of remote sensing images to achieve accurate segmentation of the target farm area is of great significance2. In the early stage, the image segmentation methods are mainly based on edge detection3, region growth4, threshold processing5 and energy functional6 to segment the target area. Although the above non-learning traditional segmentation methods can segment the specific target in the specific scene, the segmentation performance is easily affected by the environment and the imaging source. SAR images with a lot of noise and lack of texture and color information will be affected a lot. Using traditional image segmentation methods will result in poor segmentation effects7. With the emergence of deep learning networks, its powerful feature extraction capabilities have been verified in many fields. FCN8 is one of the earliest methods of deep learning methods applied in the field of image segmentation. It has achieved excellent results in image segmentation tasks and promoted the application of deep learning methods in this field. Subsequently, many deep learning segmentation methods such as U-Net9, Linknet10 and UNet++11 have been proposed, which continuously improves the segmentation accuracy under different scenes and different image sources. In order to improve the segmentation accuracy of deep learning methods, an effective feature extraction network structure is crucial12. Simonyan et al. propose VGG16 and VGG1913. They use several consecutive 3×3 convolution kernels to replace the larger convolution kernel in AlexNet. However, the networks use the fully connected layers, which require a large amount of calculation. He et al. propose the Resnet network14, which design the residual structure to solve the gradient disappearance or explosion caused by the superposition of network layers. In order to further improve the feature extraction ability of the network, Jie et al. introduce the attention mechanism into Resnet and propose SE-Net, which can help the network focus on learning important feature information15. In addition to the above-mentioned effective feature extraction networks, Xception16, inceptionv417, etc. have also achieved good feature extraction results on some public data sets. Considering the powerful learning ability of deep learning networks and the importance of feature extraction networks, we propose a precise segmentation scheme for offshore farms in high-resolution SAR images based on improved UNet++. The contributions of this research are as follows:
The rest of this research is arranged as follows: Section 2, the proposed scheme is introduced in detail. Section 3 elaborates on the experiments and experimental results. Section 4 summarizes the full text. 2.METHOD2.1The proposed schemeIn order to achieve precise segmentation of offshore farms in high-resolution SAR images, we propose a segmentation scheme based on improved UNet++. The overall segmentation flowchart is shown in Figure 1. From Figure 1, the scheme can be divided into model training and model application. In the model training stage, firstly, the training samples are resized to 256×256 pixels. Secondly, performing data augmentation operations on the samples to expand the sample set and reduce the overfitting of the training model18. Then, the enhanced data is input into the improved UNet++, and the simulated annealing strategy is used to constrain the learning rate of the network. Finally, the best trained model is chose by the validation samples. In the model application stage, the test set is resized to 256×256 pixels and input into the trained model for testing. Then, resizing the segmentation image to the size of the original image, which is the required binary segmentation image. 2.2Simulated annealing strategyIt is very easy for the network training to fall into the local optimal during the training. Therefore, we introduce a simulated annealing strategy19 for the update of the learning rate, as shown in equations (1) and (2). where ηt denotes the current learning rate. ηmax and ηmin respectively denote the set initial learning rate and minimum learning rate. T0 is the epoch of the first restart. ß denotes the multiple of epochs compared to the previous restart interval, which is a constant. Tcur is the number of epochs since the last restart and Ti is the number of epochs between two restarts. As shown in Figure 2, it shows the learning rate changes with epochs when ηmax = 0.0003, ηmin = 0.00001, T0 = 3, ß = 2. The change in the learning rate is rippled, and the number of epochs between restarts every two times increases multiply. When the weights generated by epoch training fall into the local optimum, initializing the learning rate will help it jump out of the local optimum. And the epochs between the next two restarts are getting larger and larger, which provides a sufficient number of iterations for the weight to converge to the minimum. 2.3Improved UNet++In order to enhance the feature extraction capability of the original UNet++ network, we integrate SE-Net as a feature extraction network structure into UNet++, and propose an improved UNet++. As shown in Figure 3, our improved UNet++ uses SE-Resnet101 (SE-Net) as the feature extraction network module, and replaces the encoding part of the original UNet++ with the feature map output from the five stages of SE-Resnet101. SE-Resnet101 is formed by adding an SE attention module that distributes weights to each channel on the Resnet101 network structure, which can pay more attention to the feature information of the target. At the same time, our proposed network is based on UNet++. UNet++ is an improvement based on U-Net. It has excellent performance in segmentation tasks and is widely used. 2.4Loss function and loss optimizationFor the research task, simply using the binary cross-entropy loss function will cause the loss value to be low but the segmentation effect is not accurate enough. Therefore, this research uses a combination of binary cross-entropy loss and dice loss20, and its expression is as shown in equation (3). where pn,c ∈ P and yn,c ∈ Y denote the target labels and predicted probabilities of the c-th classes and n-th pixels in the batch processing, respectively. Y and P denote the true value and prediction result of the test image, respectively. C and N denote the number of classes and pixels, respectively. Using cross-entropy loss and dice loss integrated loss function can promote the rapid convergence of the model. ε(Y, P) will be continuously updated and gradually approach 0 during the network training. 3.METHOD3.1Data setThe high-resolution SAR offshore farm data set is derived from the SAR data of Haisi No. 1 and Gaofen No. 3 with a resolution of 1-3m. The entire data set has a total of 4000 images with sizes ranging from 512 to 2048 pixels, and most of the samples are 512×512 pixels. The scene covers the coastal areas of southeastern China, and the target includes large-scale seafood farms commonly found in the coastal waters of China. Figure 4 shows part of the sample images and annotated images. We can find that the SAR image has a lot of noise and the features are not obvious. 3.2Experimental environment and parameter settingsThe experimental environment of this research is ubuntu18.04 operating system. GPU is RTX 2080Ti 11G. The epochs are 100, the batch sizes are 8. The learning rate is 0.0003, the momentum is 0.9, and the weight decay is 0.0005. The initial number of rounds of simulated annealing is 3. The number of epochs between two restarts increases by multiples. The optimization of the weights in the deep learning network requires a large number of training samples. In the experiment, we randomly use rotation transformation and flip transformation to enhance the data of the training samples to improve the robustness of the generated model. In view of the evaluation metric for the local test set, we choose FWIoU21 as the evaluation metric, as shown in equation (4) where Pii denotes it was originally the i-th class, and it is predicted to be the i-th class; Pij denotes it is originally the i-th class, but it is predicted to be the j-th class; Pji denotes it is originally the j-th class, but it is predicted to be the i-th class. k denotes the total number of classes. 3.3Effect verification of simulated annealing strategyIn order to better verify the effectiveness of the simulated annealing strategy, both U-Net and UNet++ have been verified in the experiment. The experimental results are shown in Table 1. The images in the experiment are all resized to 256×256 pixels for experiment. Table 1.Comparison of segmentation accuracy of whether to use simulated annealing strategy under U-Net and UNet++ networks.
It can be seen from Table 1 that using the simulated annealing strategy, the segmentation performance of U-Net and UNet++ has been significantly improved. The U-Net and UNet++ respectively increased by 0.0224 and 0.0191 in FWIoU. At the same time, by observing the segmentation performance of U-Net and UNet++, it is found that the segmentation performance of the UNet++ network is better than that of U-Net. 3.4Effect verification of resize strategyFor the high-resolution SAR offshore farm data set, we consider that the segmentation performance of uniformly resizing the image to 256×256 pixels for training is better than resizing the image to 512×512 pixels. In order to better verify this idea, we use different resize strategies on U-Net and UNet++, and compare the final segmentation performance. The experimental results are shown in Table 2. The simulated annealing strategy is used in the experiments. Table 2.The impact of different resize strategies on segmentation accuracy under U-Net and UNet++ networks.
Note: “512” denotes to resize the image to 512×512 pixels. “256” denotes to resize the image to 256×256 pixels. From Table 2, the segmentation accuracy of resizing to 256×256 pixels is better than that of 512×512 pixels under U-Net and UNet++. The U-Net and UNet++ effect respectively increase by 0.0034 and 0.0031 in FWIoU. On the one hand, by resizing the image to 256×256 pixels, the receptive field of convolution becomes relatively larger. On the other hand, the target area of data set is all in blocks, and the edges are relatively flat. Using a smaller size image input can increase the robustness of the input image to some small disturbances. In addition to the comparative experiments of segmentation accuracy, we also compare the test time. As shown in Table 3, it will take less time to resize the image to 256×256 pixels. The speed of U-Net has increased by 20% (from 0.02 to 0.016), and the speed of UNet++ has increased by 33% (from 0.027 to 0.018). Table 3.The impact of different resize strategies on test time under U-Net and UNet++ networks.
Note: “512” denotes to resize the image to 512×512 pixels. “256” denotes to resize the image to 256×256 pixels. 3.5Effect verification of improved U-Net++In order to verify the segmentation performance of the proposed improved U-Net++, we compare it with multiple methods. The experimental results are shown in Table 4. Table 4.Comparison of different segmentation methods.
From Table 4, the proposed improved UNet++ has achieved relatively optimal segmentation accuracy. And compared to the original UNet++, the proposed network has an increase of 0.0034 in FWIoU. In order to show the segmentation effect of the proposed network more intuitively, the mask generated after the experimental segmentation is drawn to the original image as shown in Figure 5. From Figure 5, although there is a small part of the mis-segmentation area between two adjacent farm areas in some images, the segmentation boundaries of most images are relatively accurate. It shows that the overall segmentation effect of the proposed scheme is very good. 4.CONCLUSIONUsing high-resolution SAR images to accurately segment the area of offshore farms can assist in the statistics of farming density and the planning of the farming layout. A precise segmentation scheme for offshore farms in high-resolution SAR images based on improved UNet++ is proposed. We firstly adopt the simulated annealing strategy to update the learning rate during network training. It can reduce the network falling into a local optimum and promote the positive update of the network weights. Secondly, we verify that, for the data set studied, the segmentation effect of resizing image to 256×256 pixels is better than that of 512×512 pixels. This operation relatively increases the receptive field of each layer of the network, and increases the model’s robustness to some disturbances such as noise. Finally, we compare the segmentation effect of using multiple different feature extraction networks to replace the encoding part of UNet++, and construct an improved UNet++, which uses SE-Net as the feature extraction network. The experimental results show that on the high-resolution SAR offshore farm data set, the proposed improved UNet++ scheme is as high as 0.9853 in FWIoU. ACKNOWLEDGMENTSThis research is supported by the Innovation Project of Equipment Development Department--Information Perception Technology under Grant no. E01Z040601. REFERENCESZhou, W. Z., Zhang, Y. Y., Tu, W. L., Wang, H. Y., Cao, J. G., Wu, H. L., Lv, W. W. and Tan, Y. S.,
“Effect of the density of cage culture on the growth performance of Whitmania pigra,”
Jiangsu Agricultural Science, 46
(24), 194
–19
(2018). Google Scholar
Yu, C., Hu, Z. H., Li, R. Q., Xia, X., Zhao, Y. C., Fan, X. and Bai, Y.,
“Segmentation and density statistics of mariculture cages from remote sensing images using mask R-CNN,”
Information Processing in Agriculture,
(2021). Google Scholar
Arbelaez, P., Maire, M., Fowlkes, C. and Malik, J.,
“Contour detection and hierarchical image segmentation,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, 33
(5), 898
–916
(2011). https://doi.org/10.1109/TPAMI.2010.161 Google Scholar
Wang, Y., Wang, L. C., Zheng, Y. F. and Lei, T.,
“Aurora image segmentation method based on region growing,”
Computer Engineering and Applications, 52
(23), 190
–195
(2016). Google Scholar
Wang, H. W., Liang, Y. Y. and Wang, Z. H.,
“Otsu image threshold segmentation method based on new genetic algorithm,”
Laser Technology, 38
(3), 364
–367
(2014). Google Scholar
Cui, W. C., Gong, G. Q., Lu, K., Sun, S. F. and Dong, F. M.,
“Convex-relaxed active contour model based on localised kernel mapping,”
IET Image Processing, 11
(11), 976
–985
(2017). https://doi.org/10.1049/ipr2.v11.11 Google Scholar
An, Y., Long, J. W. and Mabu, S.,
“A segmentation network with multiattention and its application to SAR image analysis,”
IEEE Transactions on Electrical and Electronic Engineering, 15
(4), 570
–576
(2020). https://doi.org/10.1002/tee.v15.4 Google Scholar
Shelhamer, E., Long, J. and Darrell, T.,
“Fully convolutional networks for semantic segmentation,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, 39
(4), 640
–651
(2017). https://doi.org/10.1109/TPAMI.2016.2572683 Google Scholar
Olaf, R., Philipp, F. and Thomas, B.,
“U-Net: Convolutional networks for biomedical image segmentation,”
in 18th Inter. Conf. on Medical Image Computing and Computer-Assisted Intervention,
234
–241
(2015). Google Scholar
Abhishek, C. and Eugenio, C.,
“LinkNet: Exploiting encoder representations for efficient semantic segmentation,”
2017 IEEE Visual Communications and Image Processing, 1
–4
(2017). Google Scholar
Zhou, Z. W., Siddiquee, M. M. R., Tajbakhsh, M. and Liang, J. M.,
“UNet++: Redesigning skip connections to exploit multiscale features in image segmentation,”
IEEE Transactions on Medical Imaging, 39
(6), 1856
–1867
(2019). https://doi.org/10.1109/TMI.42 Google Scholar
Krizhevsky, A., Sutskever, I. and Hinton, G.,
“ImageNet classification with deep convolutional neural networks,”
Advances in Neural Information Processing Systems, 25
(2),
(2012). Google Scholar
Simonyan, K. and Zisserman, A.,
“Very deep convolutional networks for large-scale image recognition,”
Computer Science,
(2014). Google Scholar
He, K. M., Zhang, X. Y., Ren, S. Q. and Sun, J.,
“Deep residual learning for image recognition,”
29th IEEE Con. on Computer Vision and Pattern Recognition, 770
–778
(2015). Google Scholar
Jie, H., Li, S. and Gang, S.,
“Squeeze-and-excitation networks,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, 99
(2017). Google Scholar
Chollet, F.,
“Xception: Deep learning with depthwise separable convolutions,”
in 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) IEEE,
1800
–1807
(2017). Google Scholar
Szegedy, C., Loffe, S., Vanhoucke, V. and Alemi, A.,
“Inception-v4, inception-ResNet and the impact of residual connections on learning,”
in 31st AAAI Conf. on Artificial Intelligence,
4278
–4284
(2016). Google Scholar
Zoph, B., Cubuk, E. D., Ghiasi, G., Lin, T. Y., Shlens, J. and Le, Q. V.,
“Learning data augmentation strategies for object detection,”
Lecture Notes in Computer Science, 566
–583
(2020). https://doi.org/10.1007/978-3-030-58583-9 Google Scholar
Loshchilov, I. and Hutter, F.,
“SGDR: Stochastic gradient descent with warm restarts,”
in 5th Inter. Conf. on Learning Representations,
(2016). Google Scholar
Milletari, F., Navab, N. and Ahmdi, S. A.,
“V-Net: Fully convolutional neural networks for volumetric medical image segmentation,”
in Proc. of 4th Inter. Conf. on 3D Vision,
565
–571
(2016). Google Scholar
Alberto, G. G., Sergio, O. E., Sergiu, O., Victor, V. M. and Jose, G. R.,
“A review on deep learning techniques applied to semantic segmentation,”
Comput. Vis. Pattern Recognit, 48
(5), 644
–654
(2017). Google Scholar
|