Umeå universitets logga

umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Centralization Within Sub-Experiments Enhances the Biological Relevance of Gene Co-expression Networks: A Plant Mitochondrial Case Study
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Umeå Plant Science Centre (UPSC). Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för fysiologisk botanik.ORCID-id: 0000-0003-0389-6650
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för matematik och matematisk statistik.
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för matematik och matematisk statistik.
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för matematik och matematisk statistik.
Visa övriga samt affilieringar
2020 (Engelska)Ingår i: Frontiers in Plant Science, E-ISSN 1664-462X, Vol. 11, artikel-id 524Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Gene co-expression networks (GCNs) can be prepared using a variety of mathematical approaches based on data sampled across diverse developmental processes, tissue types, pathologies, mutant backgrounds, and stress conditions. These networks are used to identify genes with similar expression dynamics but are prone to introducing false-positive and false-negative relationships, especially in the instance of large and heterogenous datasets. With the aim of optimizing the relevance of edges in GCNs and enhancing global biological insight, we propose a novel approach that involves a data-centering step performed simultaneously per gene and per sub-experiment, called centralization within sub-experiments (CSE). Using a gene set encoding the plant mitochondrial proteome as a case study, our results show that all CSE-based GCNs assessed had significantly more edges within the majority of the considered functional sub-networks, such as the mitochondrial electron transport chain and its complexes, than GCNs not using CSE; thus demonstrating that CSE-based GCNs are efficient at predicting canonical functions and associated pathways, here referred to as the core gene network. Furthermore, we show that correlation analyses using CSE-processed data can be used to fine-tune prediction of the function of uncharacterized genes; while its use in combination with analyses based on non-CSE data can augment conventional stress analyses with the innate connections underpinning the dynamic system being examined. Therefore, CSE is an effective alternative method to conventional batch correction approaches, particularly when dealing with large and heterogenous datasets. The method is easy to implement into a pre-existing GCN analysis pipeline and can provide enhanced biological relevance to conventional GCNs by allowing users to delineate a core gene network. Author Summary Gene co-expression networks (GCNs) are the product of a variety of mathematical approaches that identify causal relationships in gene expression dynamics but are prone to the misdiagnoses of false-positives and false-negatives, especially in the instance of large and heterogenous datasets. In light of the burgeoning output of next-generation sequencing projects performed on a variety of species, and developmental or clinical conditions; the statistical power and complexity of these networks will undoubtedly increase, while their biological relevance will be fiercely challenged. Here, we propose a novel approach to generate a "core" GCN with enhanced biological relevance. Our method involves a data-centering step that effectively removes all primary treatment/tissue effects, which is simple to employ and can be easily implemented into pre-existing GCN analysis pipelines. The gain in biological relevance resulting from the adoption of this approach was assessed using a plant mitochondrial case study.

Ort, förlag, år, upplaga, sidor
Frontiers Media S.A., 2020. Vol. 11, artikel-id 524
Nyckelord [en]
correlation, gene co-expression network, metabolism, method, plant mitochondria
Nationell ämneskategori
Biokemi och molekylärbiologi
Identifikatorer
URN: urn:nbn:se:umu:diva-173437DOI: 10.3389/fpls.2020.00524ISI: 000542980000001PubMedID: 32582224Scopus ID: 2-s2.0-85086578832OAI: oai:DiVA.org:umu-173437DiVA, id: diva2:1453574
Forskningsfinansiär
Vetenskapsrådet, 621-2014-4688Vetenskapsrådet, 340-2013-5185KempestiftelsernaCarl Tryggers stiftelse för vetenskaplig forskning Tillgänglig från: 2020-07-10 Skapad: 2020-07-10 Senast uppdaterad: 2024-01-17Bibliografiskt granskad
Ingår i avhandling
1. Hidden patterns that matter: statistical methods for analysis of DNA and RNA data
Öppna denna publikation i ny flik eller fönster >>Hidden patterns that matter: statistical methods for analysis of DNA and RNA data
2020 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Alternativ titel[sv]
Dolda betydelsefulla mönster : statistiska metoder för analys av DNA och RNA data
Abstract [en]

Understanding how the genetic variations can affect characteristics and function of organisms can help researchers and medical doctors to detect genetic alterations that cause disease and reveal genes that causes antibiotic resistance. The opportunities and progress associated with such data come however with challenges related to statistical analysis. It is only by using properly designed and employed tools, that we can extract the information about hidden patterns. In this thesis we present three types of such analysis. First, the genetic variant in the gene COL17A1 that causes corneal dystrophy with recurrent erosions is reveled. By studying Next-generation sequencing data, the order of the nucleotides in the DNAsequence was be obtained, which enabled us to detect interesting variants in the genome. Further, we present results of an experimental design study with the aim to make the best selection from a family that is affected by an inherited disease. In second part of the work, we analyzed a novel antibiotic resistance Staphylococcus epidermidis clone that is only found in northern Europe. By investigating its genetic data, we revealed similarities to a world known antibiotic resistance clone. As a result, the antibiotic resistance profile is established from the DNA sequences. Finally, we also focus on the challenges related to the abundance of genetic data from different sources. The increasing number of public gene expression datasets gives us opportunity to increase our understanding by using information from multiple sources simultaneously. Naturally, this requires merging independent datasets together. However, when doing so, the technical and biological variation in the joined data increases. We present a pre-processing method to construct gene co-expression networks from a large diverse gene-expression dataset.

Ort, förlag, år, upplaga, sidor
Umeå: Umeå universitet, Institutionen för matematik och matematisk statistik, 2020. s. 26
Serie
Research report in mathematical statistics, ISSN 1653-0829 ; 71/20
Nyckelord
Genome, Next-generation sequence, statistics, microarrays, bacteria, antibiotic resistance, inherited diseases, Co-expression networks, centralization within subgroups
Nationell ämneskategori
Sannolikhetsteori och statistik Biologiska vetenskaper Medicin och hälsovetenskap
Identifikatorer
urn:nbn:se:umu:diva-175242 (URN)978-91-7855-240-5 (ISBN)978-91-7855-241-2 (ISBN)
Disputation
2020-10-16, Hörsal B, Lindellhallen, Umeå, 09:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2020-09-25 Skapad: 2020-09-22 Senast uppdaterad: 2020-09-23Bibliografiskt granskad

Open Access i DiVA

fulltext(14415 kB)227 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 14415 kBChecksumma SHA-512
593fefefceadc562e43e9366e7375ad33428eba6a636b5f5a2227b0f289a8b1f94489ab842b03755d7866036fbb49bb5830b9d66023f9bafb48dd68fcf70d1b1
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltextPubMedScopus

Person

Law, Simon RKellgren, ThereseBjörk, RafaelRydén, PatrikKeech, Olivier

Sök vidare i DiVA

Av författaren/redaktören
Law, Simon RKellgren, ThereseBjörk, RafaelRydén, PatrikKeech, Olivier
Av organisationen
Umeå Plant Science Centre (UPSC)Institutionen för fysiologisk botanikInstitutionen för matematik och matematisk statistikKemiska institutionen
I samma tidskrift
Frontiers in Plant Science
Biokemi och molekylärbiologi

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 227 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
pubmed
urn-nbn

Altmetricpoäng

doi
pubmed
urn-nbn
Totalt: 561 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf