Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Deciphering sequence data: A multivariate approach
Umeå University, Faculty of Science and Technology, Department of Chemistry. (Kemometri)ORCID iD: 0000-0001-9188-5518
1997 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In this thesis, attention has been focused on the quantitative description of nucleic acids, proteins and peptides. The strategy was to use multivariate chemometrical methods for improving the understanding of the complex structural codes of these kinds of biological molecules. Tools have been developed that enable quantitative modelling of biological molecules, i.e. models based on data that quantitatively describes their properties. The advantage of such models is that they provide interpretations in terms of chemical characteristics for complex features such as similarity, dissimilarity and potency.

By a multivariate physical-chemical characterization of the building blocks of nucleic acids and proteins, i.e. nucleosides and amino acids, descriptive scales have been developed, so called principal properties. The scales give a description of the intrinsic properties of these building blocks. The multivariate characterization results in a multi-property matrix. A principal component analysis of the multi-property matrix gives a small number of latent variables which are considered as the principal properties of the characterized molecules.

The principal property scales may be used for a wide range of different purposes, such as detecting trends and groupings in large sequence data sets, and for analyzing quantitative relationships between structure and function. In statistical experimental design, the descriptors are well suited as design variables to select combinations of amino acids in such a way that they span a wide range of properties.

The use of these principal property descriptors is demonstrated in the quantitative modelling of relationships between structure and activity of various peptide series, DNA-promoters and in the quantitative modelling of transfer ribonucleic acid sequence data (tRNA).

Place, publisher, year, edition, pages
Umeå: Solfjädern Offset AB , 1997. , p. 76
Keywords [en]
Principal properties, amino acids, nucleotides, tRNA, DNA, multivariate data analysis, sequence analysis, QSAR, quantitative sequence activity relationships
National Category
Organic Chemistry
Identifiers
URN: urn:nbn:se:umu:diva-142699ISBN: 91-7191-337-8 (print)OAI: oai:DiVA.org:umu-142699DiVA, id: diva2:1164028
Public defence
1997-06-06, N320, Naturvetarhuset, 90187, Umeå, 14:00 (Swedish)
Opponent
Supervisors
Available from: 2023-02-03 Created: 2017-12-08 Last updated: 2023-02-03Bibliographically approved
List of papers
1. DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structures
Open this publication in new window or tab >>DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structures
1993 (English)In: Analytica Chimica Acta, ISSN 0003-2670, E-ISSN 1873-4324, Vol. 277, p. 239-253Article in journal (Refereed) Published
Abstract [en]

Biopolymer sequences (e.g., DNA, RNA, proteins and polysaccharides) and chemical processes (e.g., a batch or continuous polymer synthesis run in a chemical plant) have close similarities from the modelling point of view. When a set of sequences or processes is characterized by multivariate data, a three-way data matrix is obtained. With sequences the position and with processes the time is one direction in this matrix. The multivariate modelling of this matrix by principal component analysis (PCA) or partial least-squares (PLS) methods for the following purposes is discussed: classification of sequences; quantitative relationships between sequence and biological activity or chemical properties; optimizing a sequence with respect to selected properties; process diagnostics; and quantitative relationships between process variables and product quality variables. To obtain good models, a number of problems have to be adequately dealt with: appropriate characterization of the sequence or process; experimental design (selecting sequences or process settings); transforming the three-way into a two-way matrix; and appropriate modelling and validation (modelling interactions, periodicities, “time series” structures and “neighbour effects”). A multivariate approach to sequence and process modelling using PCA and PLS projections to latent structures is discussed and illustrated with several sets of peptide and DNA promoter data.

Keywords
DNA, peptide sequences, multivariate, principal component analysis, partial least-squares projections to latent structures
National Category
Organic Chemistry
Identifiers
urn:nbn:se:umu:diva-142536 (URN)10.1016/0003-2670(93)80437-P (DOI)
Available from: 2017-12-01 Created: 2017-12-01 Last updated: 2018-06-09
2. The evolutionary transition from uracil to thymine balances the genetic code
Open this publication in new window or tab >>The evolutionary transition from uracil to thymine balances the genetic code
1996 (English)In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 10, p. 163-170Article in journal (Refereed) Published
Abstract [en]

A multivariate quantitative physicochemical characterization of the five bases adenine (A), cytosine (C), guanine (G), thymine (T) and uracil (U), followed by principal component analysis, shows that the relative dissimilarities between the bases of DNA (A, C, G and T) are almost the same (i.e. balanced). In contrast, mRNA (containing U instead of T) has a considerably larger relative physicochemical similarity between C and U than between all other pairs of bases and is therefore inherently more unbalanced. These results provide a physicochemical explanation of the presence of thymine instead of uracil as an element of DNA. The principal component scores enable a quantitative description of nucleic acid sequence data to be made for structure-activity modelling purposes.

Place, publisher, year, edition, pages
John Wiley & Sons, 1996
Keywords
multivariate characterization, DNA, nucleotides
National Category
Organic Chemistry
Identifiers
urn:nbn:se:umu:diva-142533 (URN)10.1002/(SICI)1099-128X(199603)10:2<163::AID-CEM415>3.0.CO;2-S (DOI)
Available from: 2017-12-01 Created: 2017-12-01 Last updated: 2018-06-09
3. A multivariate characterization of tRNA nucleosides
Open this publication in new window or tab >>A multivariate characterization of tRNA nucleosides
1996 (English)In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 10, no 5-6, p. 493-508Article in journal (Refereed) Published
Abstract [en]

Twenty nucleosides occurring in transfer ribonucleic acid (tRNA) have been characterized using 21 experimentally determined (HPLC, TLC, NMR, etc.) and calculated (log P, van der Waals surface area, ionization potential, etc.) variables. Principal component analysis (PCA) was performed on the data set and four statistically significant components or principal properties (PPs) were extracted. The PPs described 68·4% of the variance in the data. The PP values are discussed in terms of similarity and dissimilarity among the nucleosides. The loading vectors from the PCA are used for an interpretation of the nature of the PP vectors. Application of the PPs in sequence-activity modelling is demonstrated with 25 DNA-promoter sequences originating from E. coli.

Keywords
Nucleosides, Multivariate characterization, QSAR, Principal properties, Principal component analysis
National Category
Organic Chemistry
Identifiers
urn:nbn:se:umu:diva-142534 (URN)10.1002/(SICI)1099-128X(199609)10:5/6&lt;493::AID-CEM447&gt;3.0.CO;2-C (DOI)
Available from: 2017-12-01 Created: 2017-12-01 Last updated: 2018-06-09
4. New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids.
Open this publication in new window or tab >>New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids.
1998 (English)In: Journal of Medicinal Chemistry, ISSN 0022-2623, E-ISSN 1520-4804, Vol. 41, no 14, p. 2481-2491, article id 9651153Article in journal (Refereed) Published
Abstract [en]

In this study 87 amino acids (AA.s) have been characterized by 26 physicochemical descriptor variables. These descriptor variables include experimentally determined retention values in seven thin-layer chromatography (TLC) systems, three nuclear magnetic resonance (NMR) shift variables, and 16 calculated variables, namely six semiempirical molecular orbital indices, total, polar, and nonpolar surface area, van der Waals volume of the side chain, log P, molecular weight, and four indicator variables describing hydrogen bond donor and acceptor properties, and side chain charge. In the present study, the data from a previous characterization of 55 AA.s from our laboratory have been extended with data for 32 additional AA.s and 14 new descriptor variables. The new 32 AA.s were selected to represent both intermediate and more extreme physicochemical properties, compared to the 20 coded AA.s. The new extended and updated principal property scales, the z-scales, were calculated and aligned to previously reported z(old)-scales. The appropriateness of the extended z-scales were validated by the use in quantitative sequence-activity modeling (QSAM) of 89 elastase substrate analogues and in a QSAM of 29 neurotensin analogues.

Place, publisher, year, edition, pages
Washington DC: American Chemical Society (ACS), 1998
Keywords
chemical descriptors, amino acids, sequence-activity modeling, characterization
National Category
Medical and Health Sciences
Research subject
Organic Chemistry
Identifiers
urn:nbn:se:umu:diva-142520 (URN)10.1021/jm9700575 (DOI)
Conference
1998 Jul 2;41(14):2481-91.
Available from: 2017-12-01 Created: 2017-12-01 Last updated: 2018-06-09
5. A new approach to quantify and analyse tRNA sequence data
Open this publication in new window or tab >>A new approach to quantify and analyse tRNA sequence data
(English)Manuscript (preprint) (Other academic)
Abstract [en]

A novel quantitative multivariate approach for describing and analyse tRNA sequence data is presented. This approach is based on a multivariate chemical description of each nucleoside in the sequence. 30 theoretically calculated descriptors were used to characterize 63 nucleosides, and principal component analysis was used to extract the main variation from this multivariate description. The resulting four principal properties were interpreted as (PPa) size/bulk of the nucleoside, (PPb) polarity/hydrophobicity of the nucleoside, (PPc) electronic properties of the nucleoside and (PPd)polarity and size of the ribose moiety. These principal properties may be used to translate the tRNA letter sequence data into a quantitative chemical representation. We demonstrate the use of this quantitative description with a multivariate analysis of a set of tRNA sequences. This analysis gives models that are interpretable in terms of wich sequence positions, and nucleoside properties that discriminate the different isoacceptors. This approach is applicable on all kinds of RNA sequence data and gives information that is complementary to current sequence analysis techniques.

National Category
Other Chemistry Topics Organic Chemistry
Research subject
Organic Chemistry
Identifiers
urn:nbn:se:umu:diva-142697 (URN)
Available from: 2017-12-08 Created: 2017-12-08 Last updated: 2018-06-09

Open Access in DiVA

No full text in DiVA

Authority records

Sandberg Hiltunen, Maria

Search in DiVA

By author/editor
Sandberg Hiltunen, Maria
By organisation
Department of Chemistry
Organic Chemistry

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 109 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf