高级检索

    泵冲一致性约束下的往复式多缸泵数据清洗方法

    A Data Cleaning Method for Reciprocating Multi-cylinder Pumps under Pump Stroke Consistency Constraints

    • 摘要: 高质量数据是往复式多缸泵智能运维与故障诊断的基石。然而,现有数据清洗方法多依赖信号的统计分布或能量特征,难以应对强噪声干扰及复杂工况,且缺乏对设备物理机理的有效利用。鉴于此,文中基于多缸泵“单曲轴刚性驱动”的结构特性,提出了一种融合物理约束的泵冲一致性数据清洗方法。该方法利用时域自相关算法克服频域分辨率限制,在强噪声环境下获取高精度的泵冲频率,并基于多通道信号的物理同源性构建频率一致性指标,进而引入统计−物理双维评价体系,通过定量分析阈值敏感度锁定具备工程鲁棒性的最佳判决区间,实现异常数据的精准剔除。基于跨度近4个月、累计容量超70 GB的大规模勘探钻井现场实测数据集的验证结果表明,该策略在异常对齐F1分数上超过80%,相比传统统计学方法和机器学习方法取得了3倍以上的性能优势。这一突破不仅验证了物理一致性约束在复杂工况下的有效性,更为构建面向工业大模型的高质量、大规模实测数据集提供了鲁棒的机理化解决方案。

       

      Abstract: High-quality data serves as the cornerstone for intelligent operation, maintenance, and fault diagnosis of reciprocating multi-cylinder pumps. However, existing data cleaning methods predominantly rely on statistical distributions or energy features of signal. These approaches often struggle to handle strong noise interference and complex operating conditions, failing to effectively leverage the physical mechanisms of the equipment. To address these challenges, a physics-constrained data cleaning method based on pump stroke consistency is proposed in this paper, leveraging the “single-crankshaft rigid drive” structural characteristic of multi-cylinder pumps. The proposed method utilizes a time-domain autocorrelation algorithm to overcome frequency resolution limitations, achieving precise extraction of the pump stroke frequency even in high-noise environments. Furthermore, a frequency consistency index is constructed based on the physical homology of multi-channel signals. Subsequently, a statistical-physical dual evaluation system is introduced to quantitatively analyze threshold sensitivity, thereby determining a robust optimal threshold interval for the precise elimination of abnormal data. Validation results based on a large-scale industrial field dataset, spanning nearly four months with a total volume exceeding 70 GB, demonstrate that this strategy achieves an anomaly alignment F1-score exceeding 80%, representing a performance improvement of over three times compared to mainstream machine learning and traditional statistical methods. This breakthrough not only validates the effectiveness of physical consistency constraints under complex operating conditions, but also provides a robust, mechanism-based solution for constructing high-quality, large-scale real-world datasets for industrial large models.

       

    /

    返回文章
    返回