Umeå universitets logga

umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Hidden patterns that matter: statistical methods for analysis of DNA and RNA data
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för matematik och matematisk statistik.
2020 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)Alternativ titel
Dolda betydelsefulla mönster : statistiska metoder för analys av DNA och RNA data (Svenska)
Abstract [en]

Understanding how the genetic variations can affect characteristics and function of organisms can help researchers and medical doctors to detect genetic alterations that cause disease and reveal genes that causes antibiotic resistance. The opportunities and progress associated with such data come however with challenges related to statistical analysis. It is only by using properly designed and employed tools, that we can extract the information about hidden patterns. In this thesis we present three types of such analysis. First, the genetic variant in the gene COL17A1 that causes corneal dystrophy with recurrent erosions is reveled. By studying Next-generation sequencing data, the order of the nucleotides in the DNAsequence was be obtained, which enabled us to detect interesting variants in the genome. Further, we present results of an experimental design study with the aim to make the best selection from a family that is affected by an inherited disease. In second part of the work, we analyzed a novel antibiotic resistance Staphylococcus epidermidis clone that is only found in northern Europe. By investigating its genetic data, we revealed similarities to a world known antibiotic resistance clone. As a result, the antibiotic resistance profile is established from the DNA sequences. Finally, we also focus on the challenges related to the abundance of genetic data from different sources. The increasing number of public gene expression datasets gives us opportunity to increase our understanding by using information from multiple sources simultaneously. Naturally, this requires merging independent datasets together. However, when doing so, the technical and biological variation in the joined data increases. We present a pre-processing method to construct gene co-expression networks from a large diverse gene-expression dataset.

Ort, förlag, år, upplaga, sidor
Umeå: Umeå universitet, Institutionen för matematik och matematisk statistik , 2020. , s. 26
Serie
Research report in mathematical statistics, ISSN 1653-0829 ; 71/20
Nyckelord [en]
Genome, Next-generation sequence, statistics, microarrays, bacteria, antibiotic resistance, inherited diseases, Co-expression networks, centralization within subgroups
Nationell ämneskategori
Sannolikhetsteori och statistik Biologiska vetenskaper Medicin och hälsovetenskap
Identifikatorer
URN: urn:nbn:se:umu:diva-175242ISBN: 978-91-7855-240-5 (tryckt)ISBN: 978-91-7855-241-2 (digital)OAI: oai:DiVA.org:umu-175242DiVA, id: diva2:1469646
Disputation
2020-10-16, Hörsal B, Lindellhallen, Umeå, 09:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2020-09-25 Skapad: 2020-09-22 Senast uppdaterad: 2020-09-23Bibliografiskt granskad
Delarbeten
1. Mutations in Collagen, Type XVII, Alpha 1 (COL17A1) Cause Epithelial Recurrent Erosion Dystrophy (ERED)
Öppna denna publikation i ny flik eller fönster >>Mutations in Collagen, Type XVII, Alpha 1 (COL17A1) Cause Epithelial Recurrent Erosion Dystrophy (ERED)
Visa övriga...
2015 (Engelska)Ingår i: Human Mutation, ISSN 1059-7794, E-ISSN 1098-1004, Vol. 36, nr 4, s. 463-473Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Corneal dystrophies are a clinically and genetically heterogeneous group of inherited disorders that bilaterally affect corneal transparency. They are defined according to the corneal layer affected and by their genetic cause. In this study, we identified a dominantly inherited epithelial recurrent erosion dystrophy (ERED)-like disease that is common in northern Sweden. Whole-exome sequencing resulted in the identification of a novel mutation, c.2816C>T, p.T939I, in the COL17A1 gene, which encodes collagen type XVII alpha 1. The variant segregated with disease in a genealogically expanded pedigree dating back 200 years. We also investigated a unique COL17A1 synonymous variant, c.3156C>T, identified in a previously reported unrelated dominant ERED-like family linked to a locus on chromosome 10q23-q24 encompassing COL17A1. We show that this variant introduces a cryptic donor site resulting in aberrant pre-mRNA splicing and is highly likely to be pathogenic. Bi-allelic COL17A1 mutations have previously been associated with a recessive skin disorder, junctional epidermolysis bullosa, with recurrent corneal erosions being reported in some cases. Our findings implicate presumed gain-of-function COL17A1 mutations causing dominantly inherited ERED and improve understanding of the underlying pathology.

Ort, förlag, år, upplaga, sidor
John Wiley & Sons, 2015
Nyckelord
COL17A1, BP180, cornea dystrophy, ERED, ddPCR
Nationell ämneskategori
Medicinsk biovetenskap
Identifikatorer
urn:nbn:se:umu:diva-103155 (URN)10.1002/humu.22764 (DOI)000352304200011 ()25676728 (PubMedID)2-s2.0-84925859470 (Scopus ID)
Anmärkning

Contract grant sponsors: Umeå University and Västerbotten County Council, Research and Development Foundation sponsored by Västerbotten County Council, Cronqvists Stiftelse (administered by The Swedish Society of Medicine); Ögonfonden, Stiftelsen KMA; the National Swedish Research Council (521-2013-2612); National Institute for Health Research Biomedical Research Centre at Moorfields Eye Hospital and UCL Institute of Ophthalmology; Moorfields Special Trustees; Moorfields Eye Charity; the Lanvern foundation.

Tillgänglig från: 2015-05-29 Skapad: 2015-05-18 Senast uppdaterad: 2023-03-24Bibliografiskt granskad
2. Experimental designs for finding disease-causing mutations in rare diseases
Öppna denna publikation i ny flik eller fönster >>Experimental designs for finding disease-causing mutations in rare diseases
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Nationell ämneskategori
Sannolikhetsteori och statistik
Identifikatorer
urn:nbn:se:umu:diva-175239 (URN)
Tillgänglig från: 2020-09-22 Skapad: 2020-09-22 Senast uppdaterad: 2020-09-22
3. The emergence of an antimicrobial resistant Staphylococcus epidermidis clone in Northern Europe
Öppna denna publikation i ny flik eller fönster >>The emergence of an antimicrobial resistant Staphylococcus epidermidis clone in Northern Europe
Visa övriga...
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Nationell ämneskategori
Sannolikhetsteori och statistik Mikrobiologi inom det medicinska området
Identifikatorer
urn:nbn:se:umu:diva-175240 (URN)
Tillgänglig från: 2020-09-22 Skapad: 2020-09-22 Senast uppdaterad: 2020-11-11
4. Centralization Within Sub-Experiments Enhances the Biological Relevance of Gene Co-expression Networks: A Plant Mitochondrial Case Study
Öppna denna publikation i ny flik eller fönster >>Centralization Within Sub-Experiments Enhances the Biological Relevance of Gene Co-expression Networks: A Plant Mitochondrial Case Study
Visa övriga...
2020 (Engelska)Ingår i: Frontiers in Plant Science, E-ISSN 1664-462X, Vol. 11, artikel-id 524Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Gene co-expression networks (GCNs) can be prepared using a variety of mathematical approaches based on data sampled across diverse developmental processes, tissue types, pathologies, mutant backgrounds, and stress conditions. These networks are used to identify genes with similar expression dynamics but are prone to introducing false-positive and false-negative relationships, especially in the instance of large and heterogenous datasets. With the aim of optimizing the relevance of edges in GCNs and enhancing global biological insight, we propose a novel approach that involves a data-centering step performed simultaneously per gene and per sub-experiment, called centralization within sub-experiments (CSE). Using a gene set encoding the plant mitochondrial proteome as a case study, our results show that all CSE-based GCNs assessed had significantly more edges within the majority of the considered functional sub-networks, such as the mitochondrial electron transport chain and its complexes, than GCNs not using CSE; thus demonstrating that CSE-based GCNs are efficient at predicting canonical functions and associated pathways, here referred to as the core gene network. Furthermore, we show that correlation analyses using CSE-processed data can be used to fine-tune prediction of the function of uncharacterized genes; while its use in combination with analyses based on non-CSE data can augment conventional stress analyses with the innate connections underpinning the dynamic system being examined. Therefore, CSE is an effective alternative method to conventional batch correction approaches, particularly when dealing with large and heterogenous datasets. The method is easy to implement into a pre-existing GCN analysis pipeline and can provide enhanced biological relevance to conventional GCNs by allowing users to delineate a core gene network. Author Summary Gene co-expression networks (GCNs) are the product of a variety of mathematical approaches that identify causal relationships in gene expression dynamics but are prone to the misdiagnoses of false-positives and false-negatives, especially in the instance of large and heterogenous datasets. In light of the burgeoning output of next-generation sequencing projects performed on a variety of species, and developmental or clinical conditions; the statistical power and complexity of these networks will undoubtedly increase, while their biological relevance will be fiercely challenged. Here, we propose a novel approach to generate a "core" GCN with enhanced biological relevance. Our method involves a data-centering step that effectively removes all primary treatment/tissue effects, which is simple to employ and can be easily implemented into pre-existing GCN analysis pipelines. The gain in biological relevance resulting from the adoption of this approach was assessed using a plant mitochondrial case study.

Ort, förlag, år, upplaga, sidor
Frontiers Media S.A., 2020
Nyckelord
correlation, gene co-expression network, metabolism, method, plant mitochondria
Nationell ämneskategori
Biokemi och molekylärbiologi
Identifikatorer
urn:nbn:se:umu:diva-173437 (URN)10.3389/fpls.2020.00524 (DOI)000542980000001 ()32582224 (PubMedID)2-s2.0-85086578832 (Scopus ID)
Forskningsfinansiär
Vetenskapsrådet, 621-2014-4688Vetenskapsrådet, 340-2013-5185KempestiftelsernaCarl Tryggers stiftelse för vetenskaplig forskning
Tillgänglig från: 2020-07-10 Skapad: 2020-07-10 Senast uppdaterad: 2024-01-17Bibliografiskt granskad

Open Access i DiVA

fulltext(1561 kB)218 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 1561 kBChecksumma SHA-512
6e4fe4127ad02a3b01b8153e0f675a2d815bdcf89f280fa415849a4ed7169bb6f43d212e2f41b35014dfa9694cbf1fff0d49d83ce9d11fad732089a6013e7db5
Typ fulltextMimetyp application/pdf
spikblad(128 kB)80 nedladdningar
Filinformation
Filnamn SPIKBLAD01.pdfFilstorlek 128 kBChecksumma SHA-512
32a4659c8811b7977931a2ea2d624f78f1f388bcf6d4ed7888797683fe6741864f79117376c646e31a88a4048bc4c12e679048d81ad90fc9529faf2bca64ca2d
Typ spikbladMimetyp application/pdf

Person

Kellgren, Therese

Sök vidare i DiVA

Av författaren/redaktören
Kellgren, Therese
Av organisationen
Institutionen för matematik och matematisk statistik
Sannolikhetsteori och statistikBiologiska vetenskaperMedicin och hälsovetenskap

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 218 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 823 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf