Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
On the definition of probabilistic metric spaces by means of fuzzy measures
Department of Management Science, Tamagawa University, Machida, Tokyo, Japan.
Umeå University, Faculty of Science and Technology, Department of Computing Science.
Umeå University, Faculty of Science and Technology, Department of Computing Science.ORCID iD: 0000-0002-0368-8037
2023 (English)In: Fuzzy sets and systems (Print), ISSN 0165-0114, E-ISSN 1872-6801, Vol. 465, article id 108528Article in journal (Refereed) Published
Abstract [en]

Metric spaces are defined in terms of a space and a metric, or distance. Probabilistic metric spaces are a useful extension of metric spaces where the distance is a distribution instead of a number. In this way, we can take into account uncertainty. Then, the triangle inequality is replaced by a condition based on triangle functions on the distributions. In this paper we introduce F-spaces. This is a new type of probabilistic metric spaces which is based on fuzzy measures (also known as non-additive measures and capacities). We prove some properties that describe which families of fuzzy measures are compatible with which type of triangle functions. Then, we show how we can use Sugeno, Choquet integrals, and, in general, any other fuzzy integral as a tool for building these spaces. We show how these results can be used to compute distances between functions. We illustrate the example comparing three types of means when applied to a set of databases. The example uses Sugeno λ-measures to illustrate the theoretical results presented in the paper.

Place, publisher, year, edition, pages
Elsevier, 2023. Vol. 465, article id 108528
Keywords [en]
Fuzzy integrals, Fuzzy measures, Probabilistic metric spaces
National Category
Computer Sciences Computer Systems Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:umu:diva-207877DOI: 10.1016/j.fss.2023.108528ISI: 001012160000001Scopus ID: 2-s2.0-85153802533OAI: oai:DiVA.org:umu-207877DiVA, id: diva2:1754812
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Available from: 2023-05-04 Created: 2023-05-04 Last updated: 2025-04-30Bibliographically approved
In thesis
1. Probabilistic metric space for machine learning: data and model spaces
Open this publication in new window or tab >>Probabilistic metric space for machine learning: data and model spaces
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Probabilistiska metriska rum för maskininlärning : data- och modellrum
Abstract [en]

Machine learning models are inherently shaped by the data used to train them. Understanding the relationship between datasets and the models they generate is essential for tasks such as model selection, privacy metrics, and robustness evaluation. This thesis presents a rigorous mathematical framework for comparing machine learning models and algorithms by formalizing the interaction between two fundamental spaces: the database space, which captures possible datasets, and the model space, which contains the models or hypotheses derived from those datasets. A central motivation stems from the observation that different datasets can lead to the same or highly similar models. Such recurrent models—which arise frequently across diverse data sources—are particularly significant in privacy-sensitive applications. Their recurrence suggests reduced dependence on any specific data point or subgroup, thus offering inherent privacy and generalization benefits. By quantifying the relationship between models and their generating data, this work enables principled evaluation of a model’s robustness and disclosure risk.

To formalize relationships between the two spaces, the thesis develops a family of probabilistic metric space constructions tailored to different aspects of the data–model interaction. The first contribution models database evolution as a Markov process and defines probabilistic distances between models based on the likelihood of transitioning between their generating datasets. The second contribution introduces F-space, a framework based on fuzzy measures that captures richer structural properties of the data—such as redundancy, synergy, and overlap among subsets. Building on this, the third contribution applies the F-space theory in practical machine learning scenarios. It demonstrates how fuzzy measures can be used to compare different linear regression algorithms trained over structured subsets of real datasets. The final contribution further generalizes the framework through Generalized F-spaces, where the model space itself is endowed with probabilistic structure—allowing uncertainty in both the datasets and the model outputs to be captured simultaneously.

Together, these constructions offer a principled alternative to traditional model comparison metrics. Rather than relying solely on pointwise loss or accuracy, the proposed framework incorporates the diversity, dynamics, and internal structure of the data that underlies each model—enabling more robust and privacy-aware assessments.

Abstract [sv]

Maskininlärningsmodeller formas i grunden av den data de tränas på. Att förstå relationen mellan datamängder och de modeller som genereras från dem är avgörande för uppgifter såsom modellval, sekretessmätningar och robusthetsanalys. Denna avhandling presenterar ett rigoröst matematiskt ramverk för att jämföra maskininlärningsmodeller och algoritmer genom att formalisera samspelet mellan två grundläggande omfång: databasrummet, som representerar möjliga datamängder, och modellrummet, som innehåller de modeller eller hypoteser som härrör från dessa datamängder.

Ett centralt motiv är observationen att olika datamängder kan leda till samma eller mycket liknande modeller. Sådana återkommande modeller —som ofta uppstår över varierande datakällor — är särskilt betydelsefulla i integritetskänsliga tillämpningar. Återkommandet antyder ett minskat beroende av enskilda datapunkter eller undergrupper, vilket ger fördelar vad gäller både integritet och generaliserbarhet. Genom att kvantifiera relationen mellan modeller och deras genererande data möjliggör detta arbete en principbaserad utvärdering av en modells robusthet och risk för avslöjande.

För att formalisera relationen mellan de två omfången introducerar avhandlingen en familj av probabilistiska metriska rum, anpassade för olika aspekter av samspelet mellan data och modeller. Det första bidraget modellerar databasers utveckling som en Markovprocess och definierar probabilistiska avstånd mellan modeller baserat på sannolikheten att övergå mellan deras genererande datamängder. Det andra bidraget introducerar F-rum (F-space), ett ramverk baserat på fuzzy-mått som fångar rikare strukturella egenskaper hos data—såsom redundans, synergi och överlappning mellan delmängder. Det tredje bidraget tillämpar F-rum-teorin i praktiska maskininlärningsscenarier. Detvisar hur fuzzy-mått kan användas för att jämföra olika linjära regressionsalgoritmer tränade på strukturerade delmängder av verkliga datamängder. Det fjärde och sista bidraget generaliserar ramverket ytterligare genom Generaliserade F-rum, där även modellrummet ges en probabilistisk struktur — vilket möjliggör att osäkerhet i både datamängden och modellutdata fångas samtidigt. Tillsammans erbjuder dessa konstruktioner ett principiellt alternativ till traditionella jämförelsemått för modeller. I stället för att enbart förlita sig på punktvisa fel eller noggrannhet beaktar det föreslagna ramverket datans mångfald, dynamik och inre struktur — vilket möjliggör mer robusta och integritetsmedvetna analyser.

Place, publisher, year, edition, pages
Umeå: Umeå University, 2025. p. 50
Series
Report / UMINF, ISSN 0348-0542 ; 25.05
Keywords
probabilstic metric space, space of data, space of models, fuzzy measures
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-238256 (URN)978-91-8070-681-0 (ISBN)978-91-8070-680-3 (ISBN)
Public defence
2025-05-23, Hörsal NAT.D. 360, Naturvetarhuset, Umeå, 09:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2025-04-30 Created: 2025-04-28 Last updated: 2025-04-28Bibliographically approved

Open Access in DiVA

fulltext(602 kB)175 downloads
File information
File name FULLTEXT02.pdfFile size 602 kBChecksum SHA-512
5c9b1f562f7cc90e59c858c13aa35a260979ccf4f96369c7a61cb65849ec1748d65455432fb6ffc7b6233193d7b0400b1f31bd9fcd997dba3eb40b95c11decba
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Taha, MariamTorra, Vicenç

Search in DiVA

By author/editor
Taha, MariamTorra, Vicenç
By organisation
Department of Computing Science
In the same journal
Fuzzy sets and systems (Print)
Computer SciencesComputer SystemsProbability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 199 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 394 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf