Abstract:
Traditional object detection methods face challenges such as multi-defect target interference, imbalanced sample distribution, and difficulties in detecting small targets when applied to multiple types of quality inspection scenarios, including complex crack of coal-carrying trains detection, defect identification, and component absence verification. These limitations result in insufficient precision and efficiency for current railway freight loading monitoring systems when conducting comprehensive appearance quality assessments. To address these issues, a novel detection method integrating multi-modal visual object recognition algorithms is proposed in this paper. First, a lightweight object detection network based on the YOLOX-s framework is developed to enhance recognition efficiency. Second, a path aggregation feature pyramid network (PAFPN) module with multi-feature fusion is established to enable synchronous identification of multi-feature tasks. Finally, an improved intersection over union loss function with balanced parameters is designed to optimize algorithm accuracy. Experimental results demonstrate that the enhanced detection algorithm achieves 91% higher inference efficiency than the baseline model while improving mean average precision (mAP) by 5%. Furthermore, by integrating object detection, image classification, and text recognition algorithms into a multi-modal collaborative detection framework, the system's overall recognition accuracy improves by 16% with only a 7 ms increase in processing time. The proposed methodology effectively resolves the challenges of synchronous multi-feature state recognition for coal wagons, and its collaborative optimization mechanism between detection efficiency and accuracy provides an innovative solution for appearance quality monitoring of large moving equipment, demonstrating significant application value.