Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Data-driven quality management using explainable machine learning and adaptive control limits
Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics.ORCID iD: 0000-0001-6184-8951
2023 (English)Doctoral thesis, comprehensive summary (Other academic)Alternative title
Data-driven kvalitetskontroll genom förklarlig maskininlärning och adaptiva styrgränser (Swedish)
Abstract [en]

In industrial applications, the objective of statistical quality management is to achieve quality guarantees through the efficient and effective application of statistical methods. Historically, quality management has been characterized by a systematic monitoring of critical quality characteristics, accompanied by manual and experience-based root cause analysis in case of an observed decline in quality. Machine learning researchers have suggested that recent improvements in digitization, including sensor technology, computational power, and algorithmic developments, should enable more systematic approaches to root cause analysis.

In this thesis, we explore the potential of data-driven approaches to quality management. This exploration is performed with consideration to an envisioned end product which consists of an automated data collection and curation system, a predictive and explanatory model trained on historical process and quality data, and an automated alarm system that predicts a decline in quality and suggests worthwhile interventions. The research questions investigated in this thesis relate to which statistical methods are relevant for the implementation of the product, how their reliability can be assessed, and whether there are knowledge gaps that prevent this implementation.

This thesis consists of four papers: In Paper I, we simulated various types of process-like data in order to investigate how several dataset properties affect the choice of methods for quality prediction. These properties include the number of predictors, their distribution and correlation structure, and their relationships with the response. In Paper II, we reused the simulation method from Paper I to simulate multiple types of datasets, and used them to compare local explanation methods by evaluating them against a ground truth.

In Paper III, we outlined a framework for an automated process adjustment system based on a predictive and explanatory model trained on historical data. Next, given a relative cost between reduced quality and process adjustments, we described a method for searching for a worthwhile adjustment policy. Several simulation experiments were performed to demonstrate how to evaluate such a policy.

In Paper IV, we described three ways to evaluate local explanation methods on real-world data, where no ground truth is available for comparison. Additionally, we described four methods for decorrelation and dimension reduction, and describe the respective tradeoffs. These methods were evaluated on real-world process and quality data from the paint shop of the Volvo Trucks cab factory in Umeå, Sweden.

During the work on this thesis, two significant knowledge gaps were identified: The first gap is a lack of best practices for data collection and quality control, preprocessing, and model selection. The other gap is that although there are many promising leads for how to explain the predictions of machine learning models, there is still an absence of generally accepted definitions for what constitutes an explanation, and a lack of methods for evaluating the reliability of such explanations.

Place, publisher, year, edition, pages
Umeå: Umeå University , 2023. , p. 24
Series
Research report in mathematical statistics, ISSN 1653-0829 ; 74/23
Keywords [en]
quality management, machine learning, local explanation methods, process adjustment policies, simulation
National Category
Other Engineering and Technologies Probability Theory and Statistics Computer Sciences
Research subject
Mathematical Statistics; data science
Identifiers
URN: urn:nbn:se:umu:diva-208105ISBN: 978-91-8070-095-5 (print)ISBN: 978-91-8070-096-2 (electronic)OAI: oai:DiVA.org:umu-208105DiVA, id: diva2:1755612
Public defence
2023-06-02, UB.A.210, Lindellhallen 1, Umeå, 13:00 (English)
Opponent
Supervisors
Available from: 2023-05-12 Created: 2023-05-08 Last updated: 2025-02-10Bibliographically approved
List of papers
1. A simulation framework for evaluating statistical methods for quality control in manufacturing
Open this publication in new window or tab >>A simulation framework for evaluating statistical methods for quality control in manufacturing
(English)Manuscript (preprint) (Other academic)
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:umu:diva-208102 (URN)
Available from: 2023-05-08 Created: 2023-05-08 Last updated: 2023-05-09
2. A comparison of local explanation methods for high-dimensional industrial data: a simulation study
Open this publication in new window or tab >>A comparison of local explanation methods for high-dimensional industrial data: a simulation study
2022 (English)In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 207, article id 117918Article in journal (Refereed) Published
Abstract [en]

Prediction methods can be augmented by local explanation methods (LEMs) to perform root cause analysis for individual observations. But while most recent research on LEMs focus on low-dimensional problems, real-world datasets commonly have hundreds or thousands of variables. Here, we investigate how LEMs perform for high-dimensional industrial applications. Seven prediction methods (penalized logistic regression, LASSO, gradient boosting, random forest and support vector machines) and three LEMs (TreeExplainer, Kernel SHAP, and the conditional normal sampling importance (CNSI)) were combined into twelve explanation approaches. These approaches were used to compute explanations for simulated data, and real-world industrial data with simulated responses. The approaches were ranked by how well they predicted the contributions according to the true models. For the simulation experiment, the generalized linear methods provided best explanations, while gradient boosting with either TreeExplainer or CNSI, or random forest with CNSI were robust for all relationships. For the real-world experiment, TreeExplainer performed similarly, while the explanations from CNSI were significantly worse. The generalized linear models were fastest, followed by TreeExplainer, while CNSI and Kernel SHAP required several orders of magnitude more computation time. In conclusion, local explanations can be computed for high-dimensional data, but the choice of statistical tools is crucial.

Place, publisher, year, edition, pages
Elsevier, 2022
Keywords
Interpretable model, Local Explanations, Shapley values, Simulation, Statistical process control
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:umu:diva-197996 (URN)10.1016/j.eswa.2022.117918 (DOI)000827577500009 ()2-s2.0-85133195266 (Scopus ID)
Funder
Vinnova, 2015-03706
Available from: 2022-07-11 Created: 2022-07-11 Last updated: 2023-09-05Bibliographically approved
3. Data-driven process adjustment policies for quality improvement
Open this publication in new window or tab >>Data-driven process adjustment policies for quality improvement
2024 (English)In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 237, article id 121524Article in journal (Refereed) Published
Abstract [en]

Common objectives in machine learning research are to predict the output quality of manufacturing processes, to perform root cause analysis in case of reduced quality, and to propose intervention strategies. The cost of reduced quality must be weighed against the cost of the interventions, which depend on required downtime, personnel costs, and material costs. Furthermore, there is a risk of false negatives, i.e., failure to identify the true root causes, or false positives, i.e., adjustments that further reduce the quality. A policy for process adjustments describes when and where to perform interventions, and we say that a policy is worthwhile if it reduces the expected operational cost. In this paper, we describe a data-driven alarm and root cause analysis framework, that given a predictive and explanatory model trained on high-dimensional process and quality data, can be used to search for a worthwhile adjustment policy. The framework was evaluated on large-scale simulated process and quality data. We find that worthwhile adjustment policies can be derived also for problems with a large number of explanatory variables. Interestingly, the performance of the adjustment policies is almost exclusively driven by the quality of the model fits. Based on these results, we discuss key areas of future research, and how worthwhile adjustment policies can be implemented in real world applications.

Place, publisher, year, edition, pages
Elsevier, 2024
Keywords
Process adjustment policy, Quality improvement, Cost reduction, Prediction, Local explanations, Simulation
National Category
Probability Theory and Statistics Reliability and Maintenance
Identifiers
urn:nbn:se:umu:diva-208103 (URN)10.1016/j.eswa.2023.121524 (DOI)001300577100001 ()2-s2.0-85171612846 (Scopus ID)
Funder
Vinnova, 2015-03706Umeå University
Note

Originally included in thesis in manuscript form.

Volume 237, Part B.

Available from: 2023-05-08 Created: 2023-05-08 Last updated: 2025-04-24Bibliographically approved
4. Concordance and resampling for assessing the robustness of local explanation methods
Open this publication in new window or tab >>Concordance and resampling for assessing the robustness of local explanation methods
(English)Manuscript (preprint) (Other academic)
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:umu:diva-208104 (URN)
Available from: 2023-05-08 Created: 2023-05-08 Last updated: 2023-05-09

Open Access in DiVA

fulltext(802 kB)443 downloads
File information
File name FULLTEXT01.pdfFile size 802 kBChecksum SHA-512
cd973b80da9eaef6247160ef88e375a31654f6077a3d13bcf2a56add2a086ac3f8e9b5ab116c442b406ef77e2fe6c06e279858a53296b653dd24561c6b597bec
Type fulltextMimetype application/pdf
spikblad(97 kB)78 downloads
File information
File name SPIKBLAD01.pdfFile size 97 kBChecksum SHA-512
7d81b04d2d820e04e06f6b5853382c9e0f493286e861d6171c0b54a2a5405a1c997910055792d0db1b6bec9d9527684ea187cf9026a89798a4d6f576918d738e
Type spikbladMimetype application/pdf

Authority records

Fries, Niklas

Search in DiVA

By author/editor
Fries, Niklas
By organisation
Department of Mathematics and Mathematical Statistics
Other Engineering and TechnologiesProbability Theory and StatisticsComputer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 444 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1392 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf