Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Comparative Analysis of Metadata Tools for use on Unknown Operational Datasets
Umeå University, Faculty of Science and Technology, Department of Computing Science.
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

When working with large datasets it is important that the right tools and methods are selected in order to effectively, it is important that the right tools and methods are selected in order to effectively analyze the data. This thesis presents a comparative evaluation of data management tools in the categories of validation, profiling, and feature extraction. The tools, Pandera, Ydata Profiling, SweetViz, and Tsfel, were selected and integrated into a data processing system for the WARA--Ops portal in order to validate, profile, and analyze new operational datasets uploaded to the portal. Finally, the system extracts statistical information from the dataset and uses a machine learning classification algorithm to apply a general label to the data based on the extracted information.

Place, publisher, year, edition, pages
2024. , p. 43
Series
UMNAD ; 1497
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:umu:diva-227466OAI: oai:DiVA.org:umu-227466DiVA, id: diva2:1879306
External cooperation
Ericsson
Educational program
Master of Science Programme in Computing Science and Engineering
Supervisors
Examiners
Available from: 2024-06-28 Created: 2024-06-28 Last updated: 2025-04-01Bibliographically approved

Open Access in DiVA

A Comparative Analysis of Metadata Tools for use on Unknown Operational Datasets(724 kB)3 downloads
File information
File name FULLTEXT01.pdfFile size 724 kBChecksum SHA-512
0e4107fa01d9b334a9fe783c7a63b6b4232fa59c5b5f607fa9213157a3fbcd3fe8f3a9e151e69f42f703ccd20626fd7eff712875a473d6d7d23199039411de0c
Type fulltextMimetype application/pdf

By organisation
Department of Computing Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 3 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 174 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf