Öppna denna publikation i ny flik eller fönster >>2020 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Dolda betydelsefulla mönster : statistiska metoder för analys av DNA och RNA data
Abstract [en]
Understanding how the genetic variations can affect characteristics and function of organisms can help researchers and medical doctors to detect genetic alterations that cause disease and reveal genes that causes antibiotic resistance. The opportunities and progress associated with such data come however with challenges related to statistical analysis. It is only by using properly designed and employed tools, that we can extract the information about hidden patterns. In this thesis we present three types of such analysis. First, the genetic variant in the gene COL17A1 that causes corneal dystrophy with recurrent erosions is reveled. By studying Next-generation sequencing data, the order of the nucleotides in the DNAsequence was be obtained, which enabled us to detect interesting variants in the genome. Further, we present results of an experimental design study with the aim to make the best selection from a family that is affected by an inherited disease. In second part of the work, we analyzed a novel antibiotic resistance Staphylococcus epidermidis clone that is only found in northern Europe. By investigating its genetic data, we revealed similarities to a world known antibiotic resistance clone. As a result, the antibiotic resistance profile is established from the DNA sequences. Finally, we also focus on the challenges related to the abundance of genetic data from different sources. The increasing number of public gene expression datasets gives us opportunity to increase our understanding by using information from multiple sources simultaneously. Naturally, this requires merging independent datasets together. However, when doing so, the technical and biological variation in the joined data increases. We present a pre-processing method to construct gene co-expression networks from a large diverse gene-expression dataset.
Ort, förlag, år, upplaga, sidor
Umeå: Umeå universitet, Institutionen för matematik och matematisk statistik, 2020. s. 26
Serie
Research report in mathematical statistics, ISSN 1653-0829 ; 71/20
Nyckelord
Genome, Next-generation sequence, statistics, microarrays, bacteria, antibiotic resistance, inherited diseases, Co-expression networks, centralization within subgroups
Nationell ämneskategori
Sannolikhetsteori och statistik Biologiska vetenskaper Medicin och hälsovetenskap
Identifikatorer
urn:nbn:se:umu:diva-175242 (URN)978-91-7855-240-5 (ISBN)978-91-7855-241-2 (ISBN)
Disputation
2020-10-16, Hörsal B, Lindellhallen, Umeå, 09:00 (Engelska)
Opponent
Handledare
2020-09-252020-09-222020-09-23Bibliografiskt granskad