Abstract:
High-quality data serves as the cornerstone for intelligent operation, maintenance, and fault diagnosis of reciprocating multi-cylinder pumps. However, existing data cleaning methods predominantly rely on statistical distributions or energy features of signal. These approaches often struggle to handle strong noise interference and complex operating conditions, failing to effectively leverage the physical mechanisms of the equipment. To address these challenges, a physics-constrained data cleaning method based on pump stroke consistency is proposed in this paper, leveraging the “single-crankshaft rigid drive” structural characteristic of multi-cylinder pumps. The proposed method utilizes a time-domain autocorrelation algorithm to overcome frequency resolution limitations, achieving precise extraction of the pump stroke frequency even in high-noise environments. Furthermore, a frequency consistency index is constructed based on the physical homology of multi-channel signals. Subsequently, a statistical-physical dual evaluation system is introduced to quantitatively analyze threshold sensitivity, thereby determining a robust optimal threshold interval for the precise elimination of abnormal data. Validation results based on a large-scale industrial field dataset, spanning nearly four months with a total volume exceeding 70 GB, demonstrate that this strategy achieves an anomaly alignment F1-score exceeding 80%, representing a performance improvement of over three times compared to mainstream machine learning and traditional statistical methods. This breakthrough not only validates the effectiveness of physical consistency constraints under complex operating conditions, but also provides a robust, mechanism-based solution for constructing high-quality, large-scale real-world datasets for industrial large models.