Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Centralization Within Sub-Experiments Enhances the Biological Relevance of Gene Co-expression Networks: A Plant Mitochondrial Case Study
Umeå University, Faculty of Science and Technology, Umeå Plant Science Centre (UPSC). Umeå University, Faculty of Science and Technology, Department of Plant Physiology.ORCID iD: 0000-0003-0389-6650
Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics.
Umeå University, Faculty of Science and Technology, Department of Chemistry. Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics.
Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics.
Show others and affiliations
2020 (English)In: Frontiers in Plant Science, E-ISSN 1664-462X, Vol. 11, article id 524Article in journal (Refereed) Published
Abstract [en]

Gene co-expression networks (GCNs) can be prepared using a variety of mathematical approaches based on data sampled across diverse developmental processes, tissue types, pathologies, mutant backgrounds, and stress conditions. These networks are used to identify genes with similar expression dynamics but are prone to introducing false-positive and false-negative relationships, especially in the instance of large and heterogenous datasets. With the aim of optimizing the relevance of edges in GCNs and enhancing global biological insight, we propose a novel approach that involves a data-centering step performed simultaneously per gene and per sub-experiment, called centralization within sub-experiments (CSE). Using a gene set encoding the plant mitochondrial proteome as a case study, our results show that all CSE-based GCNs assessed had significantly more edges within the majority of the considered functional sub-networks, such as the mitochondrial electron transport chain and its complexes, than GCNs not using CSE; thus demonstrating that CSE-based GCNs are efficient at predicting canonical functions and associated pathways, here referred to as the core gene network. Furthermore, we show that correlation analyses using CSE-processed data can be used to fine-tune prediction of the function of uncharacterized genes; while its use in combination with analyses based on non-CSE data can augment conventional stress analyses with the innate connections underpinning the dynamic system being examined. Therefore, CSE is an effective alternative method to conventional batch correction approaches, particularly when dealing with large and heterogenous datasets. The method is easy to implement into a pre-existing GCN analysis pipeline and can provide enhanced biological relevance to conventional GCNs by allowing users to delineate a core gene network. Author Summary Gene co-expression networks (GCNs) are the product of a variety of mathematical approaches that identify causal relationships in gene expression dynamics but are prone to the misdiagnoses of false-positives and false-negatives, especially in the instance of large and heterogenous datasets. In light of the burgeoning output of next-generation sequencing projects performed on a variety of species, and developmental or clinical conditions; the statistical power and complexity of these networks will undoubtedly increase, while their biological relevance will be fiercely challenged. Here, we propose a novel approach to generate a "core" GCN with enhanced biological relevance. Our method involves a data-centering step that effectively removes all primary treatment/tissue effects, which is simple to employ and can be easily implemented into pre-existing GCN analysis pipelines. The gain in biological relevance resulting from the adoption of this approach was assessed using a plant mitochondrial case study.

Place, publisher, year, edition, pages
Frontiers Media S.A., 2020. Vol. 11, article id 524
Keywords [en]
correlation, gene co-expression network, metabolism, method, plant mitochondria
National Category
Biochemistry and Molecular Biology
Identifiers
URN: urn:nbn:se:umu:diva-173437DOI: 10.3389/fpls.2020.00524ISI: 000542980000001PubMedID: 32582224Scopus ID: 2-s2.0-85086578832OAI: oai:DiVA.org:umu-173437DiVA, id: diva2:1453574
Funder
Swedish Research Council, 621-2014-4688Swedish Research Council, 340-2013-5185The Kempe FoundationsCarl Tryggers foundation Available from: 2020-07-10 Created: 2020-07-10 Last updated: 2024-01-17Bibliographically approved
In thesis
1. Hidden patterns that matter: statistical methods for analysis of DNA and RNA data
Open this publication in new window or tab >>Hidden patterns that matter: statistical methods for analysis of DNA and RNA data
2020 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Dolda betydelsefulla mönster : statistiska metoder för analys av DNA och RNA data
Abstract [en]

Understanding how the genetic variations can affect characteristics and function of organisms can help researchers and medical doctors to detect genetic alterations that cause disease and reveal genes that causes antibiotic resistance. The opportunities and progress associated with such data come however with challenges related to statistical analysis. It is only by using properly designed and employed tools, that we can extract the information about hidden patterns. In this thesis we present three types of such analysis. First, the genetic variant in the gene COL17A1 that causes corneal dystrophy with recurrent erosions is reveled. By studying Next-generation sequencing data, the order of the nucleotides in the DNAsequence was be obtained, which enabled us to detect interesting variants in the genome. Further, we present results of an experimental design study with the aim to make the best selection from a family that is affected by an inherited disease. In second part of the work, we analyzed a novel antibiotic resistance Staphylococcus epidermidis clone that is only found in northern Europe. By investigating its genetic data, we revealed similarities to a world known antibiotic resistance clone. As a result, the antibiotic resistance profile is established from the DNA sequences. Finally, we also focus on the challenges related to the abundance of genetic data from different sources. The increasing number of public gene expression datasets gives us opportunity to increase our understanding by using information from multiple sources simultaneously. Naturally, this requires merging independent datasets together. However, when doing so, the technical and biological variation in the joined data increases. We present a pre-processing method to construct gene co-expression networks from a large diverse gene-expression dataset.

Place, publisher, year, edition, pages
Umeå: Umeå universitet, Institutionen för matematik och matematisk statistik, 2020. p. 26
Series
Research report in mathematical statistics, ISSN 1653-0829 ; 71/20
Keywords
Genome, Next-generation sequence, statistics, microarrays, bacteria, antibiotic resistance, inherited diseases, Co-expression networks, centralization within subgroups
National Category
Probability Theory and Statistics Biological Sciences Medical and Health Sciences
Identifiers
urn:nbn:se:umu:diva-175242 (URN)978-91-7855-240-5 (ISBN)978-91-7855-241-2 (ISBN)
Public defence
2020-10-16, Hörsal B, Lindellhallen, Umeå, 09:00 (English)
Opponent
Supervisors
Available from: 2020-09-25 Created: 2020-09-22 Last updated: 2020-09-23Bibliographically approved

Open Access in DiVA

fulltext(14415 kB)244 downloads
File information
File name FULLTEXT01.pdfFile size 14415 kBChecksum SHA-512
593fefefceadc562e43e9366e7375ad33428eba6a636b5f5a2227b0f289a8b1f94489ab842b03755d7866036fbb49bb5830b9d66023f9bafb48dd68fcf70d1b1
Type fulltextMimetype application/pdf

Other links

Publisher's full textPubMedScopus

Authority records

Law, Simon RKellgren, ThereseBjörk, RafaelRydén, PatrikKeech, Olivier

Search in DiVA

By author/editor
Law, Simon RKellgren, ThereseBjörk, RafaelRydén, PatrikKeech, Olivier
By organisation
Umeå Plant Science Centre (UPSC)Department of Plant PhysiologyDepartment of Mathematics and Mathematical StatisticsDepartment of Chemistry
In the same journal
Frontiers in Plant Science
Biochemistry and Molecular Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 244 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 594 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf