Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Hidden patterns that matter: statistical methods for analysis of DNA and RNA data
Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics.
2020 (English)Doctoral thesis, comprehensive summary (Other academic)Alternative title
Dolda betydelsefulla mönster : statistiska metoder för analys av DNA och RNA data (Swedish)
Abstract [en]

Understanding how the genetic variations can affect characteristics and function of organisms can help researchers and medical doctors to detect genetic alterations that cause disease and reveal genes that causes antibiotic resistance. The opportunities and progress associated with such data come however with challenges related to statistical analysis. It is only by using properly designed and employed tools, that we can extract the information about hidden patterns. In this thesis we present three types of such analysis. First, the genetic variant in the gene COL17A1 that causes corneal dystrophy with recurrent erosions is reveled. By studying Next-generation sequencing data, the order of the nucleotides in the DNAsequence was be obtained, which enabled us to detect interesting variants in the genome. Further, we present results of an experimental design study with the aim to make the best selection from a family that is affected by an inherited disease. In second part of the work, we analyzed a novel antibiotic resistance Staphylococcus epidermidis clone that is only found in northern Europe. By investigating its genetic data, we revealed similarities to a world known antibiotic resistance clone. As a result, the antibiotic resistance profile is established from the DNA sequences. Finally, we also focus on the challenges related to the abundance of genetic data from different sources. The increasing number of public gene expression datasets gives us opportunity to increase our understanding by using information from multiple sources simultaneously. Naturally, this requires merging independent datasets together. However, when doing so, the technical and biological variation in the joined data increases. We present a pre-processing method to construct gene co-expression networks from a large diverse gene-expression dataset.

Place, publisher, year, edition, pages
Umeå: Umeå universitet, Institutionen för matematik och matematisk statistik , 2020. , p. 26
Series
Research report in mathematical statistics, ISSN 1653-0829 ; 71/20
Keywords [en]
Genome, Next-generation sequence, statistics, microarrays, bacteria, antibiotic resistance, inherited diseases, Co-expression networks, centralization within subgroups
National Category
Probability Theory and Statistics Biological Sciences Medical and Health Sciences
Identifiers
URN: urn:nbn:se:umu:diva-175242ISBN: 978-91-7855-240-5 (print)ISBN: 978-91-7855-241-2 (electronic)OAI: oai:DiVA.org:umu-175242DiVA, id: diva2:1469646
Public defence
2020-10-16, Hörsal B, Lindellhallen, Umeå, 09:00 (English)
Opponent
Supervisors
Available from: 2020-09-25 Created: 2020-09-22 Last updated: 2020-09-23Bibliographically approved
List of papers
1. Mutations in Collagen, Type XVII, Alpha 1 (COL17A1) Cause Epithelial Recurrent Erosion Dystrophy (ERED)
Open this publication in new window or tab >>Mutations in Collagen, Type XVII, Alpha 1 (COL17A1) Cause Epithelial Recurrent Erosion Dystrophy (ERED)
Show others...
2015 (English)In: Human Mutation, ISSN 1059-7794, E-ISSN 1098-1004, Vol. 36, no 4, p. 463-473Article in journal (Refereed) Published
Abstract [en]

Corneal dystrophies are a clinically and genetically heterogeneous group of inherited disorders that bilaterally affect corneal transparency. They are defined according to the corneal layer affected and by their genetic cause. In this study, we identified a dominantly inherited epithelial recurrent erosion dystrophy (ERED)-like disease that is common in northern Sweden. Whole-exome sequencing resulted in the identification of a novel mutation, c.2816C>T, p.T939I, in the COL17A1 gene, which encodes collagen type XVII alpha 1. The variant segregated with disease in a genealogically expanded pedigree dating back 200 years. We also investigated a unique COL17A1 synonymous variant, c.3156C>T, identified in a previously reported unrelated dominant ERED-like family linked to a locus on chromosome 10q23-q24 encompassing COL17A1. We show that this variant introduces a cryptic donor site resulting in aberrant pre-mRNA splicing and is highly likely to be pathogenic. Bi-allelic COL17A1 mutations have previously been associated with a recessive skin disorder, junctional epidermolysis bullosa, with recurrent corneal erosions being reported in some cases. Our findings implicate presumed gain-of-function COL17A1 mutations causing dominantly inherited ERED and improve understanding of the underlying pathology.

Place, publisher, year, edition, pages
John Wiley & Sons, 2015
Keywords
COL17A1, BP180, cornea dystrophy, ERED, ddPCR
National Category
Medical Bioscience
Identifiers
urn:nbn:se:umu:diva-103155 (URN)10.1002/humu.22764 (DOI)000352304200011 ()25676728 (PubMedID)2-s2.0-84925859470 (Scopus ID)
Note

Contract grant sponsors: Umeå University and Västerbotten County Council, Research and Development Foundation sponsored by Västerbotten County Council, Cronqvists Stiftelse (administered by The Swedish Society of Medicine); Ögonfonden, Stiftelsen KMA; the National Swedish Research Council (521-2013-2612); National Institute for Health Research Biomedical Research Centre at Moorfields Eye Hospital and UCL Institute of Ophthalmology; Moorfields Special Trustees; Moorfields Eye Charity; the Lanvern foundation.

Available from: 2015-05-29 Created: 2015-05-18 Last updated: 2023-03-24Bibliographically approved
2. Experimental designs for finding disease-causing mutations in rare diseases
Open this publication in new window or tab >>Experimental designs for finding disease-causing mutations in rare diseases
(English)Manuscript (preprint) (Other academic)
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:umu:diva-175239 (URN)
Available from: 2020-09-22 Created: 2020-09-22 Last updated: 2020-09-22
3. The emergence of an antimicrobial resistant Staphylococcus epidermidis clone in Northern Europe
Open this publication in new window or tab >>The emergence of an antimicrobial resistant Staphylococcus epidermidis clone in Northern Europe
Show others...
(English)Manuscript (preprint) (Other academic)
National Category
Probability Theory and Statistics Microbiology in the medical area
Identifiers
urn:nbn:se:umu:diva-175240 (URN)
Available from: 2020-09-22 Created: 2020-09-22 Last updated: 2024-07-02
4. Centralization Within Sub-Experiments Enhances the Biological Relevance of Gene Co-expression Networks: A Plant Mitochondrial Case Study
Open this publication in new window or tab >>Centralization Within Sub-Experiments Enhances the Biological Relevance of Gene Co-expression Networks: A Plant Mitochondrial Case Study
Show others...
2020 (English)In: Frontiers in Plant Science, E-ISSN 1664-462X, Vol. 11, article id 524Article in journal (Refereed) Published
Abstract [en]

Gene co-expression networks (GCNs) can be prepared using a variety of mathematical approaches based on data sampled across diverse developmental processes, tissue types, pathologies, mutant backgrounds, and stress conditions. These networks are used to identify genes with similar expression dynamics but are prone to introducing false-positive and false-negative relationships, especially in the instance of large and heterogenous datasets. With the aim of optimizing the relevance of edges in GCNs and enhancing global biological insight, we propose a novel approach that involves a data-centering step performed simultaneously per gene and per sub-experiment, called centralization within sub-experiments (CSE). Using a gene set encoding the plant mitochondrial proteome as a case study, our results show that all CSE-based GCNs assessed had significantly more edges within the majority of the considered functional sub-networks, such as the mitochondrial electron transport chain and its complexes, than GCNs not using CSE; thus demonstrating that CSE-based GCNs are efficient at predicting canonical functions and associated pathways, here referred to as the core gene network. Furthermore, we show that correlation analyses using CSE-processed data can be used to fine-tune prediction of the function of uncharacterized genes; while its use in combination with analyses based on non-CSE data can augment conventional stress analyses with the innate connections underpinning the dynamic system being examined. Therefore, CSE is an effective alternative method to conventional batch correction approaches, particularly when dealing with large and heterogenous datasets. The method is easy to implement into a pre-existing GCN analysis pipeline and can provide enhanced biological relevance to conventional GCNs by allowing users to delineate a core gene network. Author Summary Gene co-expression networks (GCNs) are the product of a variety of mathematical approaches that identify causal relationships in gene expression dynamics but are prone to the misdiagnoses of false-positives and false-negatives, especially in the instance of large and heterogenous datasets. In light of the burgeoning output of next-generation sequencing projects performed on a variety of species, and developmental or clinical conditions; the statistical power and complexity of these networks will undoubtedly increase, while their biological relevance will be fiercely challenged. Here, we propose a novel approach to generate a "core" GCN with enhanced biological relevance. Our method involves a data-centering step that effectively removes all primary treatment/tissue effects, which is simple to employ and can be easily implemented into pre-existing GCN analysis pipelines. The gain in biological relevance resulting from the adoption of this approach was assessed using a plant mitochondrial case study.

Place, publisher, year, edition, pages
Frontiers Media S.A., 2020
Keywords
correlation, gene co-expression network, metabolism, method, plant mitochondria
National Category
Biochemistry and Molecular Biology
Identifiers
urn:nbn:se:umu:diva-173437 (URN)10.3389/fpls.2020.00524 (DOI)000542980000001 ()32582224 (PubMedID)2-s2.0-85086578832 (Scopus ID)
Funder
Swedish Research Council, 621-2014-4688Swedish Research Council, 340-2013-5185The Kempe FoundationsCarl Tryggers foundation
Available from: 2020-07-10 Created: 2020-07-10 Last updated: 2024-01-17Bibliographically approved

Open Access in DiVA

fulltext(1561 kB)231 downloads
File information
File name FULLTEXT01.pdfFile size 1561 kBChecksum SHA-512
6e4fe4127ad02a3b01b8153e0f675a2d815bdcf89f280fa415849a4ed7169bb6f43d212e2f41b35014dfa9694cbf1fff0d49d83ce9d11fad732089a6013e7db5
Type fulltextMimetype application/pdf
spikblad(128 kB)84 downloads
File information
File name SPIKBLAD01.pdfFile size 128 kBChecksum SHA-512
32a4659c8811b7977931a2ea2d624f78f1f388bcf6d4ed7888797683fe6741864f79117376c646e31a88a4048bc4c12e679048d81ad90fc9529faf2bca64ca2d
Type spikbladMimetype application/pdf

Authority records

Kellgren, Therese

Search in DiVA

By author/editor
Kellgren, Therese
By organisation
Department of Mathematics and Mathematical Statistics
Probability Theory and StatisticsBiological SciencesMedical and Health Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 231 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 855 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf