«Previous Article|Table of Contents|Next Article»

Citation:
　Ruolan Zhang,Xingchen Ji,Jinichi Koue,et al.A Comparative Analysis of Deep Learning Approaches for Visual Perception Benchmarks in Ship Navigation[J].Journal of Marine Science and Application,2026,(2):600-616.[doi:10.1007/s11804-025-00703-7]
Click and Copy

A Comparative Analysis of Deep Learning Approaches for Visual Perception Benchmarks in Ship Navigation

Info

Title:: A Comparative Analysis of Deep Learning Approaches for Visual Perception Benchmarks in Ship Navigation

Author(s):: Ruolan Zhang¹; Xingchen Ji¹; 2; Jinichi Koue²; Katsutoshi Hirayama²

Affilations:

Keywords:: Long-range perception|Visual navigation|Dataset|Multiscale detection|Vision benchmark

DOI:: 10.1007/s11804-025-00703-7

Abstract:: The establishment of a reliable benchmark for evaluating model performance is critical for advancing deep learning (DL), including its application in the recognition of the ship navigation environment. Despite the steady progress being made in object detection models across various tasks, maritime navigation presents unique challenges, such as long distances, miscellaneous objects, wide perception scales, and local conditions and features of water areas. Therefore, the improvement of DL approaches for this domain remains a significant challenge. Using a widely applicable offshore image dataset from the ship bridge, we evaluated the performance of the state-of-the-art object detection model from three perspectives: average precision, multiscale feature calculation, and intersection-over-union design, and explored the factors that may affect the model performance evaluation benchmark from the perspective of data quality, scale calculation, feature quantification, and object association. Our experiments have demonstrated that, in the context of object detection tasks within complex water surface traffic scenes, comprehensive model performance evaluation benchmarks are essential. Such benchmarks must incorporate multiple dimensions of the model.

References:

[1] Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. Computer Science, Computer Vision and Pattern Recognition, arXiv preprint arXiv:2004.10934. https://doi.org/10.48550/arXiv.2004.10934
[2] Borkar S, Ghutke P, Patil W, Joshi S, Sorte S (2023) A review of pick and place robots for the pharmaceutical industry. 11th International Conference on Emerging Trends in Engineering & Technology-Signal and Information Processing (ICETET-SIP), IEEE, Nagpur, India, 1-6. DOI: 10.1109/ICETET-SIP58143.2023.10151652
[3] Cai J, Chen G, Yin J, Ding C, Suo Y, Chen J (2024) A Review of Autonomous Berthing Technology for Ships. Journal of Marine Science and Engineering 12(7): 1137. https://doi.org/10.3390/jmse12071137
[4] Cavegn S, Haala N, Nebiker S, Rothermel M, Tutzauer P (2014) Benchmarking high density image matching for oblique airborne imagery. The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 40(3): 45. https://doi.org/10.5194/isprsarchives-XL-3-45-2014
[5] Chai J, Zeng H, Li A, Ngai EW (2021) Deep learning in computer vision: A critical review of emerging techniques and application scenarios. Machine Learning with Applications 6: 100134. https://doi.org/10.1016/j.mlwa.2021.100134
[6] Chen B, Ghiasi G, Liu H, Lin TY, Kalenichenko D, Adam H, Le QV (2020) MnasFPN: Learning latency-aware pyramid architecture for object detection on mobile devices. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 13607-13616
[7] Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The Cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 3213-3223
[8] Deng J, Dong W, Socher R, Li LJ, Li K, Li FF (2009) ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, 248-255. DOI: 10.1109/CVPR.2009.5206848
[9] Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) CenterNet: Keypoint triplets for object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 6569-6578
[10] Durlik I, Miller T, Cembrowska-Lech D, Krzeminska A, Zioczowska E, Nowak A (2023) Navigating the sea of data: a comprehensive review on data analysis in maritime IoT applications. Applied Sciences 13(17): 9742. https://doi.org/10.3390/app13179742
[11] Er MJ, Chen J, Zhang Y, Gao W (2023) Research challenges, recent advances, and popular datasets in deep learning-based underwater marine object detection: A review. Sensors 23(4): 1990. https://doi.org/10.3390/s23041990
[12] Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The PASCAL Visual Object Classes (VOC) challenge. International Journal of Computer Vision 88(2): 303-338. https://doi.org/10.1007/s11263-009-0275-4
[13] Girshick R (2015) Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 1440-1448
[14] Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 580-587
[15] Hackel T, Savinov N, Ladicky L, Wegner JD, Schindler K, Pollefeys M (2017) Semantic3d. net: A new large-scale point cloud classification benchmark. Computer Science, Computer Vision and Pattern Recognition, arXiv preprint arXiv:1704.03847. https://doi.org/10.48550/arXiv.1704.03847
[16] Han X, Zhao L, Ning Y, Hu J (2021) ShipYolo: an enhanced model for ship detection. Journal of Advanced Transportation 2021(1): 1090182. https://doi.org/10.1155/2021/1060182
[17] He J, Erfani S, Ma X, Bailey J, Chi Y, Hua XS (2021) α-IoU: A family of power intersection over union losses for bounding box regression. 35th Conference on Neural Information Processing Systems, 1-13
[18] He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(9): 1904-1916. DOI: 10.1109/TPAMI.2015.2389824
[19] He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 770-778
[20] Henderson P, Ferrari V (2016) End-to-end training of object class detectors for mean average precision. Asian Conference on Computer Vision, Springer, Cham, 198-213. https://doi.org/10.1007/978-3-319-54193-8_13
[21] Howard A, Sandler M, Chen B, Wang W, Chen LC, Tan M, Chu G, Vasudevan V, Zhu Y, Pang R, Adam H, Le Q (2019) Searching for mobilenetv3. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 1314-1324
[22] Hussain M, Saher N, Qadri S (2022) Computer vision approach for liver tumor classification using CT dataset. Applied Artificial Intelligence 36(1): 2055395. https://doi.org/10.1080/08839514.2022.2055395
[23] Iancu B, Soloviev V, Zelioli L, Lilius J (2021) ABOships—An inshore and offshore maritime vessel detection dataset with precise annotations. Remote Sensing 13(5): 988. https://doi.org/10.3390/rs13050988
[24] Idrees H, Tayyab M, Athrey K, Zhang D, Al-Maadeed S, Rajpoot N, Shah M (2018) Composition loss for counting, density map estimation and localization in dense crowds. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 532-546
[25] Islam MA, Mobarak MH, Rimon MIH, Al Mahmud MZ, Ghosh J, Ahmed MMS, Hossain N (2024) Additive manufacturing in polymer research: Advances, synthesis, and applications. Polymer Testing 132: 108364
[26] Ismail N, Malik OA (2022) Real-time visual inspection system for grading fruits using computer vision and deep learning techniques. Information Processing in Agriculture 9(1): 24-37. https://doi.org/10.1016/j.polymertesting.2024.108364
[27] Jocher G (2020) YOLOv5 by Ultralytics (Version 7.0). Computer software. https://doi.org/10.5281/zenodo.3908559
[28] Karas V, Schuller DM, Schuller BW (2023) Audiovisual affect recognition for autonomous vehicles: Applications and future agendas. IEEE Transactions on Intelligent Transportation Systems 25(6): 4918-4932. DOI: 10.1109/TITS.2023.3333749
[29] Kaur R, Singh S (2023) A comprehensive review of object detection with deep learning. Digital Signal Processing 132: 103812. https://doi.org/10.1016/j.dsp.2022.103812
[30] Khan W, Zaki N, Ali L (2021) Intelligent pneumonia identification from chest x-rays: A systematic literature review. IEEE Access 9: 51747-51771. DOI: 10.1109/ACCESS.2021.3069937
[31] Lenka AK, Tripathy HK (2024) 5 Computer vision for medical diagnosis and surgery. Healthcare Big Data Analytics: Computational Optimization and Cohesive Approache, De Gruyter, Berlin, 101-124. https://doi.org/10.1515/9783110750942-005
[32] Li Y, Moreau J, Ibanez-Guzman J (2023) Emergent visual sensors for autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems 24(5): 4716-4737. DOI: 10.1109/TITS.2023.3248483
[33] Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: Common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (Eds.). Computer Vision-ECCV 2014 (ECCV 2014). Springer, Cham, 740-755. https://doi.org/10.1007/978-3-319-10602-1_48
[34] Lin TY, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2980-2988
[35] Liu S, Gao C, Chen Y, Peng X, Kong X, Wang K, Xu R, Jiang W, Ma J, Wang M (2023) Towards vehicle-to-everything autonomous driving: A survey on collaborative perception. Computer Science, Computer Vision and Pattern Recognition, arXiv preprint arXiv: 2308.16714
[36] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: Single shot multibox detector. European Conference on Computer Vision, Springer, Cham, 21-37. https://doi.org/10.1007/978-3-319-46448-0_2
[37] Liu Y, Lu B, Peng J, Zhang Z (2020) Research on the use of YOLOv5 object detection algorithm in mask wearing recognition. World Scientific Research Journal 6(11): 276-284. DOI: 10.6911/WSRJ.202011_6(11).0038
[38] Liu Z, Luo P, Wang X, Tang X (2018) Large-scale celebfaces attributes (celeba) dataset. Retrieved August 15(2018): 11
[39] Long X, Deng K, Wang G, Zhang Y, Dang Q, Gao Y, Wen S (2020) PP-YOLO: An effective and efficient implementation of object detector. arXiv preprint arXiv:2007.12099. https://doi.org/10.48550/arXiv.2007.12099
[40] Manakitsa N, Maraslidis GS, Moysis L, Fragulis GF (2024) A review of machine learning and deep learning for object detection, semantic segmentation, and human action recognition in machine and robotic vision. Technologies 12(2): 15. https://doi.org/10.3390/technologies12020015
[41] Menze M, Geiger A (2015) Object scene flow for autonomous vehicles. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 3061-3070
[42] Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 779-788
[43] Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6): 1137-1149. DOI: 10.1109/TPAMI.2016.2577031
[44] Shao Z, Wu W, Wang Z, Du W, Li C (2018) Seaships: A large-scale precisely annotated dataset for ship detection. IEEE Transactions on Multimedia 20(10): 2593-2604. DOI: 10.1109/TMM.2018.2865686
[45] Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science, Computer Vision and Pattern Recognition, arXiv preprint arXiv:1409.1556
[46] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. Proceedings of the 36th International Conference on Machine Learning, Long Beach, USA, 6105-6114
[47] Tan M, Pang R, Le QV (2020) EfficientDet: Scalable and efficient object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 10781-10790
[48] Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E (2018) Deep learning for computer vision: A brief review. Computational Intelligence and Neuroscience 2018(1): 7068349. https://doi.org/10.1155/2018/7068349
[49] Yan B, Peng H, Fu J, Wang D, Lu H (2021) Learning spatio-temporal transformer for visual tracking. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 10448-10457
[50] Yu J, Zhang C, Wang S (2021) Multichannel one-dimensional convolutional neural network-based feature learning for fault diagnosis of industrial processes. Neural Computing and Applications 33(8): 3085-3104. https://doi.org/10.1007/s00521-020-05171-4
[51] Zhang R, Ji X, Pan M (2022) Diversified assessment benchmark of vision dataset-based perception in ship navigation scenario. Proceedings of the 2022 5th International Conference on Signal Processing and Machine Learning, Dalian, China, 282-287. https://doi.org/10.1145/3556384.3556427
[52] Zhang YF, Ren W, Zhang Z, Jia Z, Wang L, Tan T (2022) Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 506: 146-157. https://doi.org/10.1016/j.neucom.2022.07.042
[53] Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2017) Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(6): 1452-1464. DOI: 10.1109/TPAMI.2017.2723009
[54] Zhou Z, Sun J, Yu J, Liu K, Duan J, Chen L, Chen CP (2021) An image-based benchmark dataset and a novel object detector for water surface object detection. Frontiers in Neurorobotics 15: 723336. https://doi.org/10.3389/fnbot.2021.723336

Memo

Memo:: Received date:2025-1-8;Accepted date:2025-3-7。<br>Foundation item:This work was partially funded by the International Association of Maritime Universities (IAMU) and The Nippon Foundation in Japan. The authors would like to acknowledge the support of the International Association of Maritime Universities (Research Project Number 20240201), The authors also gratefully acknowledge the support from the China Scholarship Council (Grant No. CXXM2209260070).<br>Corresponding author:Katsutoshi Hirayama,E-mail:hirayama@maritime.kobe-u.ac.jp

Commonly used function

Tools

Statistics

Viewed32
Downloads26
Comments

Last Update: 2026-06-08