Random databases with approximate record matching
2010 (English)In: Methodology and Computing in Applied Probability, ISSN 1387-5841, E-ISSN 1573-7713, Vol. 12, no 1, 63-89 p.Article in journal (Refereed) Published
In many database applications in telecommunication, environmental and health sciences, bioinformatics, physics, and econometrics, real-world data are uncertain and subjected to errors. These data are processed, transmitted and stored in large databases. We consider stochastic modelling for databases with uncertain data and for some basic database operations (for example, join, selection) with exact and approximate matching. Approximate join is used for merging or data deduplication in large databases. Distribution and mean of the join sizes are studied for random databases. A random database is treated as a table with independent random records with a common distribution (or a set of random tables). These results can be used for integration of information from different databases, multiple join optimization, and various probabilistic algorithms for structured random data.
Place, publisher, year, edition, pages
Boston: Kluwer , 2010. Vol. 12, no 1, 63-89 p.
Random database, Join, Tests, Approximate matching, Rényi entropy, Poisson approximation
Probability Theory and Statistics
Research subject Mathematical Statistics
IdentifiersURN: urn:nbn:se:umu:diva-30783DOI: 10.1007/s11009-008-9092-4ISI: 000273788900003OAI: oai:DiVA.org:umu-30783DiVA: diva2:286786