A prediction model for dyke-dam piping based on data augmentation and interpretable ensemble learningShow others and affiliations
2025 (English)In: Engineering Failure Analysis, ISSN 1350-6307, E-ISSN 1873-1961, Vol. 182, article id 110174Article in journal (Refereed) Published
Abstract [en]
Piping is one of the most common and hazardous issue in dyke and dam engineering, posing challenges for dyke and dam stability and risk assessments. In this study, an interpretable ensemble learning prediction model of dyke and dam piping was proposed based on the Synthetic Minority Over-sampling Technique (SMOTE) method and Ensemble Learning (EL) algorithm with a dataset collected from Yangtze River. Initially, the piping dataset was visualized using the violin diagram, and the SMOTE method was adopted to augment the imbalanced dataset. Then, t-distributed Stochastic Neighbor Embedding (t-SEN) method and Pearson correlation coefficient were used to consider the similarity between the newly generated samples and the original samples, which verify the effectiveness of the data augmentation. Subsequently, based on the augmented dataset, six EL algorithms were employed to establish the regression prediction model of piping. Through comprehensive comparison, the SMOTE-Categorical Boosting (SMOTE-CatBoost) model exhibits superior prediction accuracy and lower calculation cost, with a goodness of fit (R2) of 0.9886 and a Root Mean Square Error (RMSE) of 0.05334, making it the ideal prediction model for dyke and dam piping. Additionally, an Explainable Artificial Intelligence (XAI) model of piping was developed, and it was found that the thickness of overburden thickness of weak permeable layer (H), void ratio (e), water level height difference (Δh), and compression coefficient (av) are the four primary influencing factors of piping. The research offers valuable reference for the advance monitoring of dyke and dam piping risk, and contributes to the sustainable maintenance of dyke and dam engineering structures.
Place, publisher, year, edition, pages
Elsevier, 2025. Vol. 182, article id 110174
Keywords [en]
Data augmentation, Dyke and dam piping, Ensemble learning, Imbalanced dataset, Interpretable machine learning
National Category
Water Engineering
Identifiers
URN: urn:nbn:se:umu:diva-245498DOI: 10.1016/j.engfailanal.2025.110174ISI: 001589022300001Scopus ID: 2-s2.0-105017850995OAI: oai:DiVA.org:umu-245498DiVA, id: diva2:2007786
2025-10-212025-10-212025-10-21Bibliographically approved