Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A comparison of local explanation methods for high-dimensional industrial data: a simulation study
Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics.
Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics.
2022 (English)In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 207, article id 117918Article in journal (Refereed) Published
Abstract [en]

Prediction methods can be augmented by local explanation methods (LEMs) to perform root cause analysis for individual observations. But while most recent research on LEMs focus on low-dimensional problems, real-world datasets commonly have hundreds or thousands of variables. Here, we investigate how LEMs perform for high-dimensional industrial applications. Seven prediction methods (penalized logistic regression, LASSO, gradient boosting, random forest and support vector machines) and three LEMs (TreeExplainer, Kernel SHAP, and the conditional normal sampling importance (CNSI)) were combined into twelve explanation approaches. These approaches were used to compute explanations for simulated data, and real-world industrial data with simulated responses. The approaches were ranked by how well they predicted the contributions according to the true models. For the simulation experiment, the generalized linear methods provided best explanations, while gradient boosting with either TreeExplainer or CNSI, or random forest with CNSI were robust for all relationships. For the real-world experiment, TreeExplainer performed similarly, while the explanations from CNSI were significantly worse. The generalized linear models were fastest, followed by TreeExplainer, while CNSI and Kernel SHAP required several orders of magnitude more computation time. In conclusion, local explanations can be computed for high-dimensional data, but the choice of statistical tools is crucial.

Place, publisher, year, edition, pages
Elsevier, 2022. Vol. 207, article id 117918
Keywords [en]
Interpretable model, Local Explanations, Shapley values, Simulation, Statistical process control
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:umu:diva-197996DOI: 10.1016/j.eswa.2022.117918ISI: 000827577500009Scopus ID: 2-s2.0-85133195266OAI: oai:DiVA.org:umu-197996DiVA, id: diva2:1682685
Funder
Vinnova, 2015-03706Available from: 2022-07-11 Created: 2022-07-11 Last updated: 2023-09-05Bibliographically approved
In thesis
1. Data-driven quality management using explainable machine learning and adaptive control limits
Open this publication in new window or tab >>Data-driven quality management using explainable machine learning and adaptive control limits
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Data-driven kvalitetskontroll genom förklarlig maskininlärning och adaptiva styrgränser
Abstract [en]

In industrial applications, the objective of statistical quality management is to achieve quality guarantees through the efficient and effective application of statistical methods. Historically, quality management has been characterized by a systematic monitoring of critical quality characteristics, accompanied by manual and experience-based root cause analysis in case of an observed decline in quality. Machine learning researchers have suggested that recent improvements in digitization, including sensor technology, computational power, and algorithmic developments, should enable more systematic approaches to root cause analysis.

In this thesis, we explore the potential of data-driven approaches to quality management. This exploration is performed with consideration to an envisioned end product which consists of an automated data collection and curation system, a predictive and explanatory model trained on historical process and quality data, and an automated alarm system that predicts a decline in quality and suggests worthwhile interventions. The research questions investigated in this thesis relate to which statistical methods are relevant for the implementation of the product, how their reliability can be assessed, and whether there are knowledge gaps that prevent this implementation.

This thesis consists of four papers: In Paper I, we simulated various types of process-like data in order to investigate how several dataset properties affect the choice of methods for quality prediction. These properties include the number of predictors, their distribution and correlation structure, and their relationships with the response. In Paper II, we reused the simulation method from Paper I to simulate multiple types of datasets, and used them to compare local explanation methods by evaluating them against a ground truth.

In Paper III, we outlined a framework for an automated process adjustment system based on a predictive and explanatory model trained on historical data. Next, given a relative cost between reduced quality and process adjustments, we described a method for searching for a worthwhile adjustment policy. Several simulation experiments were performed to demonstrate how to evaluate such a policy.

In Paper IV, we described three ways to evaluate local explanation methods on real-world data, where no ground truth is available for comparison. Additionally, we described four methods for decorrelation and dimension reduction, and describe the respective tradeoffs. These methods were evaluated on real-world process and quality data from the paint shop of the Volvo Trucks cab factory in Umeå, Sweden.

During the work on this thesis, two significant knowledge gaps were identified: The first gap is a lack of best practices for data collection and quality control, preprocessing, and model selection. The other gap is that although there are many promising leads for how to explain the predictions of machine learning models, there is still an absence of generally accepted definitions for what constitutes an explanation, and a lack of methods for evaluating the reliability of such explanations.

Place, publisher, year, edition, pages
Umeå: Umeå University, 2023. p. 24
Series
Research report in mathematical statistics, ISSN 1653-0829 ; 74/23
Keywords
quality management, machine learning, local explanation methods, process adjustment policies, simulation
National Category
Other Engineering and Technologies not elsewhere specified Probability Theory and Statistics Computer Sciences
Research subject
Mathematical Statistics; data science
Identifiers
urn:nbn:se:umu:diva-208105 (URN)978-91-8070-095-5 (ISBN)978-91-8070-096-2 (ISBN)
Public defence
2023-06-02, UB.A.210, Lindellhallen 1, Umeå, 13:00 (English)
Opponent
Supervisors
Available from: 2023-05-12 Created: 2023-05-08 Last updated: 2023-05-09Bibliographically approved

Open Access in DiVA

fulltext(1547 kB)142 downloads
File information
File name FULLTEXT01.pdfFile size 1547 kBChecksum SHA-512
ea20549bb167efd163cc145fb9eae03230701d60fddfdc4226cf65b074fd729dde389eb9b17120119e716887f55d3b52a1b49798a6f70c96130f70df1daac1ad
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Fries, NiklasRydén, Patrik

Search in DiVA

By author/editor
Fries, NiklasRydén, Patrik
By organisation
Department of Mathematics and Mathematical Statistics
In the same journal
Expert systems with applications
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 142 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 361 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf