Abstract:
Traditional object detection methods face challenges such as multi-defect target interference, imbalanced sample distribution, and difficulties in detecting small targets when applied to complex quality inspection scenarios of coal-carrying trains, including crack detection, defect identification, and component absence verification. These limitations result in insufficient precision and efficiency for current railway freight loading monitoring systems in conducting comprehensive appearance quality assessments. To address these issues, this study proposes a novel detection method integrating multi-modal visual algorithms. First, a lightweight object detection network based on the YOLOX-s framework is developed to enhance recognition efficiency. Second, a path-aggregated feature pyramid network module with multi-feature fusion is established to enable synchronous identification of multi-feature tasks. Finally, an improved Intersection over Union (IoU) loss function with balanced parameters is designed to optimize algorithm accuracy. Experimental results demonstrate that the enhanced detection algorithm achieves 91% higher inference efficiency than the baseline model while improving mean Average Precision (mAP) by 5%. Furthermore, by integrating object detection, image classification, and text recognition algorithms into a multi-modal collaborative detection framework, the system's overall recognition accuracy improves by 13% with only a 7 ms increase in processing time. The proposed methodology effectively resolves the challenges of synchronous multi-feature state recognition for coal wagons, and its coordinated optimization mechanism between detection efficiency and accuracy provides an innovative solution for appearance quality monitoring of large moving equipment, demonstrating significant application value.