Umeå University's logo

umu.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Deciphering sequence data: A multivariate approach
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen. (Kemometri)ORCID-id: 0000-0001-9188-5518
1997 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

In this thesis, attention has been focused on the quantitative description of nucleic acids, proteins and peptides. The strategy was to use multivariate chemometrical methods for improving the understanding of the complex structural codes of these kinds of biological molecules. Tools have been developed that enable quantitative modelling of biological molecules, i.e. models based on data that quantitatively describes their properties. The advantage of such models is that they provide interpretations in terms of chemical characteristics for complex features such as similarity, dissimilarity and potency.

By a multivariate physical-chemical characterization of the building blocks of nucleic acids and proteins, i.e. nucleosides and amino acids, descriptive scales have been developed, so called principal properties. The scales give a description of the intrinsic properties of these building blocks. The multivariate characterization results in a multi-property matrix. A principal component analysis of the multi-property matrix gives a small number of latent variables which are considered as the principal properties of the characterized molecules.

The principal property scales may be used for a wide range of different purposes, such as detecting trends and groupings in large sequence data sets, and for analyzing quantitative relationships between structure and function. In statistical experimental design, the descriptors are well suited as design variables to select combinations of amino acids in such a way that they span a wide range of properties.

The use of these principal property descriptors is demonstrated in the quantitative modelling of relationships between structure and activity of various peptide series, DNA-promoters and in the quantitative modelling of transfer ribonucleic acid sequence data (tRNA).

sted, utgiver, år, opplag, sider
Umeå: Solfjädern Offset AB , 1997. , s. 76
Emneord [en]
Principal properties, amino acids, nucleotides, tRNA, DNA, multivariate data analysis, sequence analysis, QSAR, quantitative sequence activity relationships
HSV kategori
Identifikatorer
URN: urn:nbn:se:umu:diva-142699ISBN: 91-7191-337-8 (tryckt)OAI: oai:DiVA.org:umu-142699DiVA, id: diva2:1164028
Disputas
1997-06-06, N320, Naturvetarhuset, 90187, Umeå, 14:00 (svensk)
Opponent
Veileder
Tilgjengelig fra: 2023-02-03 Laget: 2017-12-08 Sist oppdatert: 2023-02-03bibliografisk kontrollert
Delarbeid
1. DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structures
Åpne denne publikasjonen i ny fane eller vindu >>DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structures
1993 (engelsk)Inngår i: Analytica Chimica Acta, ISSN 0003-2670, E-ISSN 1873-4324, Vol. 277, s. 239-253Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Biopolymer sequences (e.g., DNA, RNA, proteins and polysaccharides) and chemical processes (e.g., a batch or continuous polymer synthesis run in a chemical plant) have close similarities from the modelling point of view. When a set of sequences or processes is characterized by multivariate data, a three-way data matrix is obtained. With sequences the position and with processes the time is one direction in this matrix. The multivariate modelling of this matrix by principal component analysis (PCA) or partial least-squares (PLS) methods for the following purposes is discussed: classification of sequences; quantitative relationships between sequence and biological activity or chemical properties; optimizing a sequence with respect to selected properties; process diagnostics; and quantitative relationships between process variables and product quality variables. To obtain good models, a number of problems have to be adequately dealt with: appropriate characterization of the sequence or process; experimental design (selecting sequences or process settings); transforming the three-way into a two-way matrix; and appropriate modelling and validation (modelling interactions, periodicities, “time series” structures and “neighbour effects”). A multivariate approach to sequence and process modelling using PCA and PLS projections to latent structures is discussed and illustrated with several sets of peptide and DNA promoter data.

Emneord
DNA, peptide sequences, multivariate, principal component analysis, partial least-squares projections to latent structures
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-142536 (URN)10.1016/0003-2670(93)80437-P (DOI)
Tilgjengelig fra: 2017-12-01 Laget: 2017-12-01 Sist oppdatert: 2018-06-09
2. The evolutionary transition from uracil to thymine balances the genetic code
Åpne denne publikasjonen i ny fane eller vindu >>The evolutionary transition from uracil to thymine balances the genetic code
1996 (engelsk)Inngår i: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 10, s. 163-170Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

A multivariate quantitative physicochemical characterization of the five bases adenine (A), cytosine (C), guanine (G), thymine (T) and uracil (U), followed by principal component analysis, shows that the relative dissimilarities between the bases of DNA (A, C, G and T) are almost the same (i.e. balanced). In contrast, mRNA (containing U instead of T) has a considerably larger relative physicochemical similarity between C and U than between all other pairs of bases and is therefore inherently more unbalanced. These results provide a physicochemical explanation of the presence of thymine instead of uracil as an element of DNA. The principal component scores enable a quantitative description of nucleic acid sequence data to be made for structure-activity modelling purposes.

sted, utgiver, år, opplag, sider
John Wiley & Sons, 1996
Emneord
multivariate characterization, DNA, nucleotides
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-142533 (URN)10.1002/(SICI)1099-128X(199603)10:2<163::AID-CEM415>3.0.CO;2-S (DOI)
Tilgjengelig fra: 2017-12-01 Laget: 2017-12-01 Sist oppdatert: 2018-06-09
3. A multivariate characterization of tRNA nucleosides
Åpne denne publikasjonen i ny fane eller vindu >>A multivariate characterization of tRNA nucleosides
1996 (engelsk)Inngår i: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 10, nr 5-6, s. 493-508Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Twenty nucleosides occurring in transfer ribonucleic acid (tRNA) have been characterized using 21 experimentally determined (HPLC, TLC, NMR, etc.) and calculated (log P, van der Waals surface area, ionization potential, etc.) variables. Principal component analysis (PCA) was performed on the data set and four statistically significant components or principal properties (PPs) were extracted. The PPs described 68·4% of the variance in the data. The PP values are discussed in terms of similarity and dissimilarity among the nucleosides. The loading vectors from the PCA are used for an interpretation of the nature of the PP vectors. Application of the PPs in sequence-activity modelling is demonstrated with 25 DNA-promoter sequences originating from E. coli.

Emneord
Nucleosides, Multivariate characterization, QSAR, Principal properties, Principal component analysis
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-142534 (URN)10.1002/(SICI)1099-128X(199609)10:5/6&lt;493::AID-CEM447&gt;3.0.CO;2-C (DOI)
Tilgjengelig fra: 2017-12-01 Laget: 2017-12-01 Sist oppdatert: 2018-06-09
4. New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids.
Åpne denne publikasjonen i ny fane eller vindu >>New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids.
1998 (engelsk)Inngår i: Journal of Medicinal Chemistry, ISSN 0022-2623, E-ISSN 1520-4804, Vol. 41, nr 14, s. 2481-2491, artikkel-id 9651153Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

In this study 87 amino acids (AA.s) have been characterized by 26 physicochemical descriptor variables. These descriptor variables include experimentally determined retention values in seven thin-layer chromatography (TLC) systems, three nuclear magnetic resonance (NMR) shift variables, and 16 calculated variables, namely six semiempirical molecular orbital indices, total, polar, and nonpolar surface area, van der Waals volume of the side chain, log P, molecular weight, and four indicator variables describing hydrogen bond donor and acceptor properties, and side chain charge. In the present study, the data from a previous characterization of 55 AA.s from our laboratory have been extended with data for 32 additional AA.s and 14 new descriptor variables. The new 32 AA.s were selected to represent both intermediate and more extreme physicochemical properties, compared to the 20 coded AA.s. The new extended and updated principal property scales, the z-scales, were calculated and aligned to previously reported z(old)-scales. The appropriateness of the extended z-scales were validated by the use in quantitative sequence-activity modeling (QSAM) of 89 elastase substrate analogues and in a QSAM of 29 neurotensin analogues.

sted, utgiver, år, opplag, sider
Washington DC: American Chemical Society (ACS), 1998
Emneord
chemical descriptors, amino acids, sequence-activity modeling, characterization
HSV kategori
Forskningsprogram
organisk kemi
Identifikatorer
urn:nbn:se:umu:diva-142520 (URN)10.1021/jm9700575 (DOI)
Konferanse
1998 Jul 2;41(14):2481-91.
Tilgjengelig fra: 2017-12-01 Laget: 2017-12-01 Sist oppdatert: 2018-06-09
5. A new approach to quantify and analyse tRNA sequence data
Åpne denne publikasjonen i ny fane eller vindu >>A new approach to quantify and analyse tRNA sequence data
(engelsk)Manuskript (preprint) (Annet vitenskapelig)
Abstract [en]

A novel quantitative multivariate approach for describing and analyse tRNA sequence data is presented. This approach is based on a multivariate chemical description of each nucleoside in the sequence. 30 theoretically calculated descriptors were used to characterize 63 nucleosides, and principal component analysis was used to extract the main variation from this multivariate description. The resulting four principal properties were interpreted as (PPa) size/bulk of the nucleoside, (PPb) polarity/hydrophobicity of the nucleoside, (PPc) electronic properties of the nucleoside and (PPd)polarity and size of the ribose moiety. These principal properties may be used to translate the tRNA letter sequence data into a quantitative chemical representation. We demonstrate the use of this quantitative description with a multivariate analysis of a set of tRNA sequences. This analysis gives models that are interpretable in terms of wich sequence positions, and nucleoside properties that discriminate the different isoacceptors. This approach is applicable on all kinds of RNA sequence data and gives information that is complementary to current sequence analysis techniques.

HSV kategori
Forskningsprogram
organisk kemi
Identifikatorer
urn:nbn:se:umu:diva-142697 (URN)
Tilgjengelig fra: 2017-12-08 Laget: 2017-12-08 Sist oppdatert: 2018-06-09

Open Access i DiVA

Fulltekst mangler i DiVA

Person

Sandberg Hiltunen, Maria

Søk i DiVA

Av forfatter/redaktør
Sandberg Hiltunen, Maria
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 104 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf