Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Tackling a genomic abyss: approaches to link long non-coding RNAs to potential biological function in Norway spruce and aspen
Umeå University, Faculty of Science and Technology, Department of Plant Physiology. (Nathaniel Street)
2024 (English)Doctoral thesis, comprehensive summary (Other academic)Alternative title
Att tackla en genomisk avgrund : tillvägagångssätt för att koppla långa icke-kodande RNA till potentiell biologisk funktion i gran och asp (Swedish)
Abstract [en]

Protein coding genes have been extensively studied in both plant and animal genomes, while non-coding portions of the genomes were considered not relevant for a long time. This was due to the fact that non-coding led immediately to not functional, until the discovery of let-7, the first conserved miRNA, in Caenorhabditis elegans. From here on, several studies on small RNAs (sRNAs) were performed, while long non-coding RNAs (lncRNAs) have risen to attention in the last two decades, also because of their usage as diagnostic biomarkers in cancer. Studies to assign function to RNAs have progressed more slowly in plants compared to the animal kingdom and there is still a lot to explore even in the protein coding space, above all if we consider huge genomes like Norway spruce and Scots pine, so the non-coding part of the genome still represents an abyss to discover. In my PhD I mostly focused on a subclass of non-coding RNAs in Norway spruce and aspen. Long non-coding RNAs are considered arbitrarily longer than 200 nucleotides (nt) and can have one small open reading frame (sORF, length < 300 nt) coding for a short peptide (not a complete protein). lncRNAs tend to be expressed at lower levels than genes, but with precise spatio-temporal patterns. They are mostly expressed in particular tissues, stages of a biological process and/or particular conditions, that are often related to biotic or abiotic stresses. They have low levels of sequence homology conservation, even in close related species. In particular, I studied the class of lncRNAs located in the intergenic space, the long intergenic non-coding RNAs (lincRNAs). 

In the first part of this thesis, I developed a pipeline to identify lincRNAs. This pipeline allows to identify in silico bona fide lincRNAs starting from an RNA-Sequencing dataset. It is an ensemble method, considering different tools and the characteristics of lincRNAs. 

In the second part of this thesis, I focused on functionally annotating lincRNAs. To achieve this challenge, I decided to use the guilt-by-association strategy. This method relies on a co-expression network containing both lincRNAs and protein coding genes. Through a functional enrichment of the protein coding genes, it is possible to transfer the same annotation to a lincRNA co-expressed in the same module. I have also tried to relate lincRNAs to a possible function in the de novo methylation of DNA via the RdDM pathway in Norway spruce.

In the last part of this thesis, I identified lincRNAs expressed during leaf development in aspen and produced CRISPR-Cas9 mutants lacking the sequence of two lincRNAs in order to provide a functional validation. 

In general, RNA-Sequencing has enabled and advanced the identification of lincRNAs, and this thesis demonstrates an implemented strategy to identify and assign putative functional information to lincRNAs, deepening the knowledge in the non-coding abyss.

Place, publisher, year, edition, pages
Umeå: Umeå University, 2024. , p. 58
Keywords [en]
Norway spruce, aspen, non-coding RNAs, long non-coding RNAs, RNA-Seq, transcriptome, functional annotation, co-expression network, guilt-by-association, functional validation, CRISPR-Cas9
National Category
Genetics and Genomics Bioinformatics and Computational Biology Plant Biotechnology
Identifiers
URN: urn:nbn:se:umu:diva-229993ISBN: 978-91-8070-491-5 (print)ISBN: 978-91-8070-492-2 (electronic)OAI: oai:DiVA.org:umu-229993DiVA, id: diva2:1900774
Public defence
2024-10-24, Stora hörsalen, byggnad KBC, Umeå, 14:00 (English)
Opponent
Supervisors
Available from: 2024-10-03 Created: 2024-09-25 Last updated: 2025-02-05Bibliographically approved
List of papers
1. A resource of identified and annotated lincRNAs expressed during somatic embryogenesis development in Norway spruce
Open this publication in new window or tab >>A resource of identified and annotated lincRNAs expressed during somatic embryogenesis development in Norway spruce
Show others...
2024 (English)In: Physiologia Plantarum, ISSN 0031-9317, E-ISSN 1399-3054, Vol. 176, no 5, article id e14537Article in journal (Refereed) Published
Abstract [en]

Long non-coding RNAs (lncRNAs) have emerged as important regulators of many bio- logical processes, although their regulatory roles remain poorly characterized in woody plants, especially in gymnosperms. A major challenge of working with lncRNAs is to assign functional annotations, since they have a low coding potential and low cross-species conservation.

We utilised an existing RNA-Sequencing resource and performed short RNA sequencing of somatic embryogenesis developmental stages in Norway spruce (Picea abies L. Karst). We implemented a pipeline to identify lncRNAs located within the intergenic space (lincRNAs) and generated a co-expression network including protein coding, lincRNA and miRNA genes.

To assign putative functional annotation, we employed a guilt-by-association approach using the co-expression network and integrated these results with annota- tion assigned using semantic similarity and co-expression. Moreover, we evaluated the relationship between lincRNAs and miRNAs, and identified which lincRNAs are conserved in other species. We identified lincRNAs with clear evidence of differential expression during somatic embryogenesis and used network connectivity to identify those with the greatest regulatory potential.

This work provides the most comprehensive view of lincRNAs in Norway spruce and is the first study to perform global identification of lincRNAs during somatic embryogen- esis in conifers. The data have been integrated into the expression visualisation tools at the PlantGenIE.org web resource to enable easy access to the community. This will facilitate the use of the data to address novel questions about the role of lincRNAs in the regulation of embryogenesis and facilitate future comparative genomics studies.

Place, publisher, year, edition, pages
John Wiley & Sons, 2024
National Category
Bioinformatics and Computational Biology
Identifiers
urn:nbn:se:umu:diva-229971 (URN)10.1111/ppl.14537 (DOI)001319912800001 ()39319989 (PubMedID)2-s2.0-85204942283 (Scopus ID)
Funder
The Kempe Foundations, SMK1340Knut and Alice Wallenberg FoundationSwedish Research Council
Available from: 2024-09-23 Created: 2024-09-23 Last updated: 2025-02-07Bibliographically approved
2. New genome insights from chromosome-scale genome assemblies of Norway spruce (Picea abies) and Scots pine (Pinus sylvestris)
Open this publication in new window or tab >>New genome insights from chromosome-scale genome assemblies of Norway spruce (Picea abies) and Scots pine (Pinus sylvestris)
Show others...
2024 (English)Manuscript (preprint) (Other academic)
National Category
Bioinformatics and Computational Biology Forest Science
Identifiers
urn:nbn:se:umu:diva-229975 (URN)
Available from: 2024-09-23 Created: 2024-09-23 Last updated: 2025-02-05
3. An improved chromosome-scale genome assembly and population genetics resource for populus tremula
Open this publication in new window or tab >>An improved chromosome-scale genome assembly and population genetics resource for populus tremula
Show others...
2024 (English)In: Physiologia Plantarum, ISSN 0031-9317, E-ISSN 1399-3054, Vol. 176, no 5, article id e14511Article in journal (Refereed) Published
Abstract [en]

Aspen (Populus tremula L.) is a keystone species and a model system for forest tree genomics. We present an updated resource comprising a chromosome-scale assem- bly, population genetics and genomics data. Using the resource, we explore the genetic basis of natural variation in leaf size and shape, traits with complex genetic architecture.

We generated the genome assembly using long-read sequencing, optical and high-density genetic maps. We conducted whole-genome resequencing of the Umeå Aspen (UmAsp) collection. Using the assembly and re-sequencing data from the UmAsp, Swedish Aspen (SwAsp) and Scottish Aspen (ScotAsp) collections we performed genome-wide association analyses (GWAS) using Single Nucleotide Polymorphisms (SNPs) for 26 leaf physiognomy phenotypes. We conducted Assay of Transposase Accessible Chromatin sequencing (ATAC-Seq), identified genomic regions of accessible chromatin, and subset SNPs to these regions, improving the GWAS detection rate. We identified candidate long non-coding RNAs in leaf samples, quantified their expression in an updated co-expression network, and used this to explore the functions of candidate genes identified from the GWAS.

A GWAS found SNP associations for seven traits. The associated SNPs were in or near genes annotated with developmental functions, which represent candidates for further study. Of particular interest was a !177-kbp region harbouring associations with several leaf phenotypes in ScotAsp.

We have incorporated the assembly, population genetics, genomics, and GWAS data into the PlantGenIE.org web resource, including updating existing genomics data to the new genome version, to enable easy exploration and visualisation. We provide all raw and processed data to facilitate reuse in future studies.

Place, publisher, year, edition, pages
John Wiley & Sons, 2024
Keywords
genome assembly, natural selection, co-expression, population genetics, Populus, aspen, GWAS, leaf physiognomy, leaf shape, leaf size, genetic architecture, ATAC-Seq, lncRNA
National Category
Bioinformatics and Computational Biology Genetics and Genomics
Identifiers
urn:nbn:se:umu:diva-229976 (URN)10.1111/ppl.14511 (DOI)001313686100001 ()39279509 (PubMedID)2-s2.0-85204093798 (Scopus ID)
Funder
Swedish Research Council, 2019-05476Swedish Research Council Formas, 2018-01644Vinnova, S111416L0710
Note

Supplementary figures and appendixes under Supporting information on article web page. 

Available from: 2024-09-23 Created: 2024-09-23 Last updated: 2025-04-24Bibliographically approved
4. Identifying and validating lincRNAs expressed during terminal leaf development in aspen
Open this publication in new window or tab >>Identifying and validating lincRNAs expressed during terminal leaf development in aspen
2024 (English)Manuscript (preprint) (Other academic)
National Category
Bioinformatics and Computational Biology Genetics and Breeding in Agricultural Sciences
Identifiers
urn:nbn:se:umu:diva-229978 (URN)
Available from: 2024-09-23 Created: 2024-09-23 Last updated: 2025-02-05

Open Access in DiVA

fulltext(810 kB)195 downloads
File information
File name FULLTEXT01.pdfFile size 810 kBChecksum SHA-512
51a2d3c3336b56bbec1b22542b9c8716750b33a3e7e8cb3f19bd14304df272fce5abe7dcfc15f204ff6353fc07eb9a41f8c9c31cc1a7450e519cb3e785b8fbdf
Type fulltextMimetype application/pdf
spikblad(202 kB)54 downloads
File information
File name SPIKBLAD02.pdfFile size 202 kBChecksum SHA-512
9fe601ab7fe946f4938d78065e7cb8a251e39b7211d35491ed40e61ee47bc927160108b98dea36ed9cb95dde7391337e1ac9b6aa9596dd8f1113962224336672
Type spikbladMimetype application/pdf

Authority records

Canovi, Camilla

Search in DiVA

By author/editor
Canovi, Camilla
By organisation
Department of Plant Physiology
Genetics and GenomicsBioinformatics and Computational BiologyPlant Biotechnology

Search outside of DiVA

GoogleGoogle Scholar
Total: 195 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 817 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf