umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A multivariate approach to computational molecular biology
Umeå University, Faculty of Science and Technology, Department of Chemistry.
2005 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis describes the application of multivariate methods in analyses of genomic DNA sequences, gene expression and protein synthesis, which represent each of the steps in the central dogma of biology. The recent finalisation of large sequencing projects has given us a definable core of genetic data and large-scale methods for the dynamic quantification of gene expression and protein synthesis. However, in order to gain meaningful knowledge from such data, appropriate data analysis methods must be applied.

The multivariate projection methods, principal component analysis (PCA) and partial least squares projection to latent structures (PLS), were used for clustering and multivariate calibration of data. By combining results from these and other statistical methods with interactive visualisation, valuable information was extracted and further interpreted.

We analysed genomic sequences by combining multivariate statistics with cytological observations and full genome annotations. All oligomers of di- (16), tri- (64), tetra- (256), penta- (1024) and hexa-mers (4096) of DNA were separately counted and normalised and their distributions in the chromosomes of three Drosophila genomes were studied by using PCA. Using this strategy sequence signatures responsible for the differentiation of chromosomal elements were identified and related to previously defined biological features. We also developed a tool, which has been made publicly available, to interactively analyse single nucleotide polymorphism data and to visualise annotations and linkage disequilibrium.

PLS was used to investigate the relationships between weather factors and gene expression in field-grown aspen leaves. By interpreting PLS models it was possible to predict if genes were mainly environmentally or developmentally regulated. Based on a PCA model calculated from seasonal gene expression profiles, different phases of the growing season were identified as different clusters. In addition, a publicly available dataset with gene expression values for 7070 genes was analysed by PLS to classify tumour types. All samples in a training set and an external test set were correctly classified. For the interpretation of these results a method was applied to obtain a cut-off value for deciding which genes could be of interest for further studies.

Potential biomarkers for the efficacy of radiation treatment of brain tumours were identified by combining quantification of protein profiles by SELDI-MS-TOF with multivariate analysis using PCA and PLS. We were also able to differentiate brain tumours from normal brain tissue based on protein profiles, and observed that radiation treatment slows down the development of tumours at a molecular level.

By applying a multivariate approach for the analysis of biological data information was extracted that would be impossible or very difficult to acquire with traditional methods. The next step in a systems biology approach will be to perform a combined analysis in order to elucidate how the different levels of information are linked together to form a regulatory network.

Place, publisher, year, edition, pages
Umeå: Kemi , 2005. , 149 p.
Keyword [en]
PLS, PCA, biomarker, Drosophila, SNP, linkage, disequilibrium, bioinformatics, computational, molecular, biology, genomics, microarray, SELDI, proteomics
National Category
Bioinformatics (Computational Biology)
Identifiers
URN: urn:nbn:se:umu:diva-609ISBN: 91-7305-965-X (print)OAI: oai:DiVA.org:umu-609DiVA: diva2:143974
Public defence
2005-11-04, 10:00
Supervisors
Available from: 2005-10-12 Created: 2005-10-12 Last updated: 2011-03-11Bibliographically approved
List of papers
1. Sequence signature analysis of chromosome identity in three Drosophila species
Open this publication in new window or tab >>Sequence signature analysis of chromosome identity in three Drosophila species
Show others...
2005 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 6, no 158, 1-17 p.Article in journal (Refereed) Published
Abstract [en]

Background: All eukaryotic organisms need to distinguish each of their chromosomes. A few protein complexes have been described that recognise entire, specific chromosomes, for instance dosage compensation complexes and the recently discovered autosome-specific Painting of Fourth (POF) protein in Drosophila. However, no sequences have been found that are chromosome-specific and distributed over the entire length of the respective chromosome. Here, we present a new, unbiased, exhaustive computational method that was used to probe three Drosophila genomes for chromosome-specific sequences.

Results: By combining genome annotations and cytological data with multivariate statistics related to three Drosophila genomes we found sequence signatures that distinguish Muller's F-elements ( chromosome 4 in D. melanogaster) from all other chromosomes in Drosophila that are not attributable to differences in nucleotide composition, simple sequence repeats or repeated elements. Based on these signatures we identified complex motifs that are strongly overrepresented in the F-elements and found indications that the D. melanogaster motif may be involved in POF-binding to the F-element. In addition, the X-chromosomes of D. melanogaster and D. yakuba can be distinguished from the other chromosomes, albeit to a lesser extent. Surprisingly, the conservation of the F-element sequence signatures extends not only between species separated by approximately 55 Myr, but also linearly along the sequenced part of the F-elements.

Conclusion: Our results suggest that chromosome-distinguishing features are not exclusive to the sex chromosomes, but are also present on at least one autosome ( the F-element) in Drosophila.

Keyword
dosage-compensation, transposable elements, beta-heterochromatin, x-chromosome, melanogaster, dna, regression, complexes, evolution, reveals
Identifiers
urn:nbn:se:umu:diva-13313 (URN)doi:10.1186/1471-2105-6-158 (DOI)
Available from: 2007-10-26 Created: 2007-10-26 Last updated: 2011-03-11Bibliographically approved
2. GOLDsurfer: Three dimensional display of linkage disequilibrium
Open this publication in new window or tab >>GOLDsurfer: Three dimensional display of linkage disequilibrium
2004 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1460-2059, Vol. 20, no 17, 3241-3243 p.Article in journal (Refereed) Published
Abstract [en]

GOLDsurfer is a java-based analysis and graphics program for three-dimensional plotting of linkage disequilibrium (LD). Simultaneous presentation of LD measures, including recombination rate estimates and disease association statistics, helps to clarify LD patterns and facilitates interpretations based on multiple indices of local genetic data.

Place, publisher, year, edition, pages
Oxford: Oxford University Press, 2004
Identifiers
urn:nbn:se:umu:diva-20653 (URN)10.1093/bioinformatics/bth341 (DOI)15201180 (PubMedID)
Available from: 2009-03-24 Created: 2009-03-24 Last updated: 2011-03-09Bibliographically approved
3. Interpretation and validation of PLS-models for microarray data
Open this publication in new window or tab >>Interpretation and validation of PLS-models for microarray data
2005 (English)In: Chemometrics and cheminformatics, Washington, DC: American Chemical Society : Distributed by Oxford University Press , 2005, 31-40 p.Chapter in book (Other academic)
Place, publisher, year, edition, pages
Washington, DC: American Chemical Society : Distributed by Oxford University Press, 2005
Identifiers
urn:nbn:se:umu:diva-4745 (URN)9780841238589 (ISBN)
Available from: 2005-10-12 Created: 2005-10-12 Last updated: 2011-03-09Bibliographically approved
4. What affects mRNA levels in leaves of fieldgrown aspen?: A study of developmental and environmental influences
Open this publication in new window or tab >>What affects mRNA levels in leaves of fieldgrown aspen?: A study of developmental and environmental influences
2003 In: Plant physiology, ISSN 0032-0889, Vol. 133, 1190-1197 p.Article in journal (Refereed) Published
Identifiers
urn:nbn:se:umu:diva-4746 (URN)
Available from: 2005-10-12 Created: 2005-10-12Bibliographically approved
5. Changes in protein expression in experimental malignant glioma following radiotherapy
Open this publication in new window or tab >>Changes in protein expression in experimental malignant glioma following radiotherapy
Show others...
(English)Manuscript (Other academic)
Identifiers
urn:nbn:se:umu:diva-4747 (URN)
Available from: 2005-10-12 Created: 2005-10-12 Last updated: 2013-03-19Bibliographically approved

Open Access in DiVA

fulltext(5928 kB)1676 downloads
File information
File name FULLTEXT01.pdfFile size 5928 kBChecksum SHA-1
2327298b7c5608fed36d7f03b36bfd00bfaa5f6336a8f87761bd3062550dbc21f1d4cc94
Type fulltextMimetype application/pdf

By organisation
Department of Chemistry
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
Total: 1676 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1271 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf