umu.sePublications
Change search
Refine search result
1 - 26 of 26
CiteExportLink to result list
Cite
Citation style
• apa
• ieee
• modern-language-association-8th-edition
• vancouver
• Other style
More styles
Language
• de-DE
• en-GB
• en-US
• fi-FI
• nn-NO
• nn-NB
• sv-SE
• Other locale
More languages
Output format
• html
• text
• asciidoc
• rtf
Rows per page
• 5
• 10
• 20
• 50
• 100
• 250
Sort
• Standard (Relevance)
• Author A-Ö
• Author Ö-A
• Title A-Ö
• Title Ö-A
• Publication type A-Ö
• Publication type Ö-A
• Issued (Oldest first)
• Issued (Newest first)
• Created (Oldest first)
• Created (Newest first)
• Last updated (Oldest first)
• Last updated (Newest first)
• Standard (Relevance)
• Author A-Ö
• Author Ö-A
• Title A-Ö
• Title Ö-A
• Publication type A-Ö
• Publication type Ö-A
• Issued (Oldest first)
• Issued (Newest first)
• Created (Oldest first)
• Created (Newest first)
• Last updated (Oldest first)
• Last updated (Newest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
• 1.
Umeå University, Faculty of Social Sciences, Statistics.
Umeå University, Faculty of Social Sciences, Statistics.
Partial Partial Likelihood2008In: Communications in Statistics: Simulation and Computation, ISSN 0361-0918, Vol. 37, no 4, 679-686 p.Article in journal (Refereed)

The maximum likelihood and maximum partial likelihood

approaches to the proportional hazards model are unified. The purpose is to give a general approach to the analysis of the proportional hazards model, whether the baseline distribution is absolutely continuous, discrete, or a mixture. The advantage is that heavily tied data will be analyzed with a discrete time model, while data with no ties is analyzed with ordinary Cox regression. Data sets in between are treated by a compromise between the discrete time model and Efron's approach to tied data in survival analysis, and the transitions between modes are automatic. A simulation study is conducted comparing the proposed approach to standard methods of handling ties. A recent suggestion, that revives Breslow's approach to tied data, is finally discussed.

• 2.
Umeå University, Faculty of Arts, Department of historical, philosophical and religious studies, Environmental Archaeology Lab. Umeå University, Faculty of Arts, Humlab.
The Bugs Coleopteran Ecology Package (BugsCEP) database: 1000 sites and half a million fossils later2014In: Quaternary International, ISSN 1040-6182, E-ISSN 1873-4553, Vol. 341, 272-282 p.Article in journal (Refereed)

The Bugs database project started in the late 1980s as what would now be considered a relatively simple system, albeit advanced for its time, linking fossil beetle species lists to modern habitat and distribution information. Since then, Bugs has grown into a complex database of fossils records, habitat and distribution data, dating and climate reference data wrapped into an advanced software analysis package. At the time of writing, the database contains raw data and metadata for 1124 sites, and Russell Coope directly contributed to the analysis of over 154 (14%) of them, some 98790 identifications published in 231 publications. Such quantifications are infeasible without databases, and the analytical power of combining a database of modern and fossil insects with analysis tools is potentially immense for numerous areas of science ranging from conservation to Quaternary geology.

BugsCEP, The Bugs Coleopteran Ecology Package, is the latest incarnation of the Bugs database project. Released in 2007, the database is continually added too and is available for free download from http://www.bugscep.com. The software tools include quantitative habitat reconstruction and visualisation, correlation matrices, MCR climate reconstruction, searching by habitat and retrieving, among other things, a list of taxa known from the selected habitat types. It also provides a system for entering, storing and managing palaeoentomological data as well as a number of expert system like reporting facilities.

Work is underway to create an online version of BugsCEP, implemented through the Strategic Environmental Archaeology Database (SEAD) project (http://www.sead.se). The aim is to provide more direct access to the latest data, a community orientated updating system, and integration with other proxy data. Eventually, the tools available in the offline BugsCEP will be duplicated and Bugs will be entirely in the web.

This paper summarises aspects of the current scope, capabilities and applications of the BugsCEP database and software, with special reference to and quantifications of the contributions of Russell Coope to the field of palaeoentomology as represented in the database. The paper also serves to illustrate the potential for the use of BugsCEP in biographical studies, and discusses some of the issues relating to the use of large scale sources of quantitative data.

All datasets used in this article are available through the current version of BugsCEP available at http://www.bugscep.com.

• 3.
Umeå University, Faculty of Arts, Department of historical, philosophical and religious studies, Environmental Archaeology Lab. Umeå University, Faculty of Arts, Humlab.
BugsCEP, an entomological database twenty-five years on2014In: Antenna (Journal of the Royal Entomological Society), ISSN 0140-1890, Vol. 38, no 1, 21-28 p.Article in journal (Refereed)
• 4.
Umeå University, Faculty of Arts, Department of historical, philosophical and religious studies, Environmental Archaeology Lab.
Umeå University, Faculty of Arts, Department of historical, philosophical and religious studies, Environmental Archaeology Lab. Umeå University, Faculty of Arts, Humlab.
SEAD - The Strategic Environmental Archaeology Database: Progress Report Spring 20142014Report (Other academic)

This report provides an overview of the progress and results of the VR:KFI infrastructure projects 2007-7494 and (825-)2010-5976. It should be considered as a status report in an on-going long-term research infrastructure development project.

• 5.
Umeå University, Faculty of Arts, Department of historical, philosophical and religious studies, Environmental Archaeology Lab.
Lund University. Swedish National Historical Museums. Stockholm University. Lund University. Umeå University, Faculty of Arts, Humlab. Uppsala University.
The Strategic Environmental Archaeology Database: a resource for international, multiproxy and transdisciplinary studies of environmental and climatic change2015Conference paper (Refereed)

Climate and environmental change are global challenges which require global data and infrastructure to investigate. These challenges also require a multi-proxy approach, integrating evidence from Quaternary science and archaeology with information from studies on modern ecology and physical processes among other disciplines. The Strategic Environmental Archaeology Database (SEAD http://www.sead.se) is a Swedish based international research e-infrastructure for storing, managing, analysing and disseminating palaeoenvironmental data from an almost unlimited number of analysis methods. The system currently makes available raw data from over 1500 sites (>5300 datasets) and the analysis of Quaternary fossil insects, plant macrofossils, pollen, geochemistry and sediment physical properties, dendrochronology and wood anatomy, ceramic geochemistry and bones, along with numerous dating methods. This capacity will be expanded in the near future to include isotopes, multi-spectral and archaeo-metalurgical data. SEAD also includes expandable climate and environment calibration datasets, a complete bibliography and extensive metadata and services for linking these data to other resources. All data is available as Open Access through http://qsead.sead.se and downloadable software.

SEAD is maintained and managed at the Environmental Archaeology Lab and HUMlab at Umea University, Sweden. Development and data ingestion is progressing in cooperation with The Laboratory for Ceramic Research and the National Laboratory for Wood Anatomy and Dendrochronology at Lund University, Sweden, the Archaeological Research Laboratory, Stockholm University, the Geoarchaeological Laboratory, Swedish National Historical Museums Agency and several international partners and research projects. Current plans include expanding its capacity to serve as a data source for any system and integration with the Swedish National Heritage Board's information systems.

SEAD is partnered with the Neotoma palaeoecology database (http://www.neotomadb.org) and a new initiative for building cyberinfrastructure for transdisciplinary research and visualization of the long-term human ecodynamics of the North Atlantic funded by the National Science Foundation (NSF).

• 6. Chen, Ye
Umeå University, Faculty of Science and Technology, Department of Computing Science.
A global learning with local preservation method for microarray data imputation2016In: Computers in Biology and Medicine, ISSN 0010-4825, E-ISSN 1879-0534, Vol. 77, 76-89 p.Article in journal (Refereed)

Microarray data suffer from missing values for various reasons, including insufficient resolution, image noise, and experimental errors. Because missing values can hinder downstream analysis steps that require complete data as input, it is crucial to be able to estimate the missing values. In this study, we propose a Global Learning with Local Preservation method (GL2P) for imputation of missing values in microarray data. GL2P consists of two components: a local similarity measurement module and a global weighted imputation module. The former uses a local structure preservation scheme to exploit as much information as possible from the observable data, and the latter is responsible for estimating the missing values of a target gene by considering all of its neighbors rather than a subset of them. Furthermore, GL2P imputes the missing values in ascending order according to the rate of missing data for each target gene to fully utilize previously estimated values. To validate the proposed method, we conducted extensive experiments on six benchmarked microarray datasets. We compared GL2P with eight state-of-the-art imputation methods in terms of four performance metrics. The experimental results indicate that GL2P outperforms its competitors in terms of imputation accuracy and better preserves the structure of differentially expressed genes. In addition, GL2P is less sensitive to the number of neighbors than other local learning-based imputation. methods.

• 7.
School of Medicine, Washington University, United States.
Computer Science Institute, University of Halle-Wittenberg, Germany. Department of Biology, Washington University, United States. Department of Computer Science/Department of Genetics, Washington University, United States.
How frugal is mother nature with haplotypes?2009In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, no 1, 68-74 p.Article in journal (Refereed)

Motivation: Inference of haplotypes from genotype data is crucial and challenging for many vitally important studies. The first, and most critical step, is the ascertainment of a biologically sound model to be optimized. Many models that have been proposed rely partially or entirely on reducing the number of unique haplotypes in the solution.

Results: This article examines the parsimony of haplotypes using known haplotypes as well as genotypes from the HapMap project. Our study reveals that there are relatively few unique haplotypes, but not always the least possible, for the datasets with known solutions. Furthermore, we show that there are frequently very large numbers of parsimonious solutions, and the number increases exponentially with increasing cardinality. Moreover, these solutions are quite varied, most of which are not consistent with the true solutions. These results quantify the limitations of the Pure Parsimony model and demonstrate the imperative need to consider additional properties for haplotype inference models. At a higher level, and with broad applicability, this article illustrates the power of combinatorial methods to tease out imperfections in a given biological model.

• 8. Hansen, Terkel
Abt. Chemische Biologie, Max-Planck-Institut für molekulare Physiologie.
Adenylylation, MS, and proteomics-Introducing a "new" modification to bottom-up proteomics2013In: Proteomics, ISSN 1615-9853, E-ISSN 1615-9861, Vol. 13, no 6, 955-963 p.Article in journal (Refereed)

Although the addition of a 5'-adenosine phosphodiester group to proteins, called adenylylation, has been known for decades, the possibility that adenylylation could be a molecular switch in cellular signaling pathways has emerged recently. The distinct mass shift upon adenylation of threonine or tyrosine residues renders it a good target for MS detection and identification; however, the fragmentation of adenylylated peptides derived from proteolytic digestion of adenylylated proteins has not yet been systematically investigated. Here, we demonstrate that adenylylated peptides show loss of parts of the adenosine monophosphate (AMP) upon different fragmentation techniques. As expected, causing the least fragmentation of the AMP group, electron transfer dissociation yields less complicated spectra. In contrast, CID and higher energy collision (HCD) fragmentation caused AMP to fragment, generating characteristic ions that could be utilized in the specific identification of adenylylated peptides. The characteristic ions and losses upon CID and higher energy collision fragmentation from the AMP group turned out to be highly dependent on which amino acid was adenylylated, with different reporter ions for adenylylated threonine and tyrosine. We also investigated how adenylylation is best incorporated into search engines, exemplified by Mascot and showed that it is possible to identify adenylylation by search engines.

• 9.
Umeå University, Faculty of Science and Technology, Department of Physics.
Diffusion in fractal globules2016Independent thesis Advanced level (professional degree), 300 HE creditsStudent thesis

Recent experiments suggest that the human genome (all of our DNA) is organised as a so-called fractal globule. The fractal globule is a knot--free dense polymer that easily folds and unfolds any genomic locus, for example a group of nearby genes. Proteins often need to locate specific target sites on the DNA, for instance to activate a gene. To understand how proteins move through the DNA polymer, we simulate diffusion of particles through a fractal globule. The fractal globule was generated on a cubic lattice as spheres connected by cylinders. With the structure in place, we simulate particle diffusion and measure how their mean squared displacement ($\langle R^2(t)\rangle$) grows as function of time $t$ for different particle radii. This quantity allows us to better understand how the three dimensional structure of DNA affects the protein's motion. From our simulations we found that $\langle R^2(t)/t\rangle$ is a decaying function when the particle is sufficiently large. This means that the particles diffuse slower than if they were free. Assuming that $\langle R^2(t) \rangle \propto t^\alpha$ for long times, we calculated the growth exponent $\alpha$ as a function of particle radius $r_p$. When $r_p$ is small compared to the average distance between two polymer segments $d$, we find that $\alpha \approx 1$. This means the polymer network does not affect the particle's motion. However, in the opposite limit $r_p\sim d$ we find that $\alpha<1$ which means that the polymer strongly slows down the particle's motion. This behaviour is indicative of sub-diffusive dynamics and has potentially far reaching consequences for target finding processes and biochemical reactions in the cell.

• 10. Houston, Catriona M.
Umeå University, Faculty of Medicine, Department of Integrative Medical Biology (IMB). Department Cognitive Neurology, HertieInstitute for Clinical Brain Research, University of Tübingen, Germany.
Exploring the significance of morphological diversity for cerebellar granule cell excitability2017In: Scientific Reports, ISSN 2045-2322, E-ISSN 2045-2322, Vol. 7, 46147Article in journal (Refereed)

The relatively simple and compact morphology of cerebellar granule cells (CGCs) has led to the view that heterogeneity in CGC shape has negligible impact upon the integration of mossy fibre (MF) information. Following electrophysiological recording, 3D models were constructed from high-resolution imaging data to identify morphological features that could influence the coding of MF input patterns by adult CGCs. Quantification of MF and CGC morphology provided evidence that CGCs could be connected to the multiple rosettes that arise from a single MF input. Predictions from our computational models propose that MF inputs could be more densely encoded within the CGC layer than previous models suggest. Moreover, those MF signals arriving onto the dendrite closest to the axon will generate greater CGC excitation. However, the impact of this morphological variability on MF input selectivity will be attenuated by high levels of CGC inhibition providing further flexibility to the MF. CGC pathway. These features could be particularly important when considering the integration of multimodal MF sensory input by individual CGCs.

• 11.
Umeå University, Faculty of Medicine, Department of Integrative Medical Biology (IMB), Physiology.
Umeå University, Faculty of Medicine, Department of Integrative Medical Biology (IMB), Physiology.
Direct and indirect spino-cerebellar pathways: shared ideas but different functions in motor control2015In: Frontiers in Computational Neuroscience, ISSN 1662-5188, E-ISSN 1662-5188, Vol. 9, 75Article in journal (Refereed)

The impressive precision of mammalian limb movements relies on internal feedback pathways that convey information about ongoing motor output to cerebellar circuits. The spino-cerebellar tracts (SCT) in the cervical, thoracic and lumbar spinal cord have long been considered canonical neural substrates for the conveyance of internal feedback signals. Here we consider the distinct features of an indirect spino-cerebellar route, via the brainstem lateral reticular nucleus (LRN), and the implications of this pre-cerebellar "detour" for the execution and evolution of limb motor control. Both direct and indirect spino-cerebellar pathways signal spinal interneuronal activity to the cerebellum during movements, but evidence suggests that direct SCT neurons are mainly modulated by rhythmic activity, whereas the LRN also receives information from systems active during postural adjustment, reaching and grasping. Thus, while direct and indirect spinocerebellar circuits can both be regarded as internal copy pathways, it seems likely that the direct system is principally dedicated to rhythmic motor acts like locomotion, while the indirect system also provides a means of pre-cerebellar integration relevant to the execution and coordination of dexterous limb movements.

• 12.
Computer Science Institute, University of Halle-Wittenberg, Germany.
School of Medicine, Washington University, United states. Department of Computer Science/Department of Genetics, Washington University, United States.
Complete Parsimony Haplotype Inference Problem and Algorithms2009In: Proceedings of 17th Annual European Symposium on Algorithms (ESA 2009) / [ed] A. Fiat and P. Sanders, Berlin-Heidelberg: Springer Berlin-Heidelberg , 2009, 337-348 p.Conference paper (Refereed)

Haplotype inference by pure parsimony (HIPP) is a wellknown paradigm for haplotype inference. In order to assess the biological significance of this paradigm, we generalize the problem of HIPP to the problem of finding all optimal solutions, which we call complete HIPP. We study intrinsic haplotype features, such as backbone haplotypes and fat genotypes as well as equal columns and decomposability. We explicitly exploit these features in three computational approaches which are based on integer linear programming, depth-first branch-and-bound, and a hybrid algorithm that draws on the diverse strengths of the first two approaches. Our experimental analysis shows that our optimized algorithms are significantly superior to the baseline algorithms, often with orders of magnitude faster running time. Finally, our experiments provide some useful insights to the intrinsic features of this interesting problem.

• 13.
Umeå University, Faculty of Medicine, Department of Clinical Microbiology, Clinical Bacteriology. Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics.
Umeå University, Faculty of Social Sciences, Department of Statistics. Umeå University, Faculty of Social Sciences, Department of Statistics. Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics.
MC-normalization: a novel method for dye-normalization of two-channel microarray data2009In: Statistical Applications in Genetics and Molecular Biology, ISSN 1544-6115, E-ISSN 1544-6115, Vol. 8, no 1, 42- p.Article in journal (Refereed)

Motivation: Pre-processing plays a vital role in two-color microarray data analysis. An analysis is characterized by its ability to identify differentially expressed genes (its sensitivity) and its ability to provide unbiased estimators of the true regulation (its bias). It has been shown that microarray experiments regularly underestimate the true regulation of differentially expressed genes. We introduce the MC-normalization, where C stands for channel-wise normalization, with considerably lower bias than the commonly used standard methods.

Methods: The idea behind the MC-normalization is that the channels’ individual intensities determine the correction, rather than the average intensity which is the case for the widely used MA-normalization. The two methods were evaluated using spike-in data from an in-house produced cDNA-experiment and a public available Agilent-experiment. The methods were applied on background corrected and non-background corrected data. For the cDNA-experiment the methods were either applied separately on data from each of the print-tips or applied on the complete array data. Altogether 24 analyses were evaluated. For each analysis the sensitivity, the bias and two variance measures were estimated.

Results: We prove that the MC-normalization has lower bias than the MA-normalization. The spike-in data confirmed the theoretical result and suggest that the difference is significant. Furthermore, the empirical data suggest that the MC-and MA-normalization have similar sensitivity. A striking result is that print-tip normalizations did have considerably higher sensitivity than analyses using the complete array data.

• 14.
Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics. Umeå University, Faculty of Medicine, Department of Clinical Microbiology, Clinical Bacteriology.
Umeå University, Faculty of Science and Technology, Department of Molecular Biology (Faculty of Science and Technology). Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics. Umeå University, Faculty of Science and Technology, Department of Molecular Biology (Faculty of Science and Technology).
Normalization of high dimensional genomics data where the distribution of the altered variables is skewed2011In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 6, no 11, e27942- p.Article in journal (Refereed)

Genome-wide analysis of gene expression or protein binding patterns using different array or sequencing based technologies is now routinely performed to compare different populations, such as treatment and reference groups. It is often necessary to normalize the data obtained to remove technical variation introduced in the course of conducting experimental work, but standard normalization techniques are not capable of eliminating technical bias in cases where the distribution of the truly altered variables is skewed, i.e. when a large fraction of the variables are either positively or negatively affected by the treatment. However, several experiments are likely to generate such skewed distributions, including ChIP-chip experiments for the study of chromatin, gene expression experiments for the study of apoptosis, and SNP-studies of copy number variation in normal and tumour tissues. A preliminary study using spike-in array data established that the capacity of an experiment to identify altered variables and generate unbiased estimates of the fold change decreases as the fraction of altered variables and the skewness increases. We propose the following work-flow for analyzing high-dimensional experiments with regions of altered variables: (1) Pre-process raw data using one of the standard normalization techniques. (2) Investigate if the distribution of the altered variables is skewed. (3) If the distribution is not believed to be skewed, no additional normalization is needed. Otherwise, re-normalize the data using a novel HMM-assisted normalization procedure. (4) Perform downstream analysis. Here, ChIP-chip data and simulated data were used to evaluate the performance of the work-flow. It was found that skewed distributions can be detected by using the novel DSE-test (Detection of Skewed Experiments). Furthermore, applying the HMM-assisted normalization to experiments where the distribution of the truly altered variables is skewed results in considerably higher sensitivity and lower bias than can be attained using standard and invariant normalization methods.

• 15.
Umeå University, Faculty of Science and Technology, Department of Physics.
Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, USA. Institute for the BioCentury and Department of Physics, Korea Advanced Institute of Science and Technology, Korea.
Global organization of protein complexome in the yeast Saccharomyces cerevisiae2011In: BMC Systems Biology, ISSN 1752-0509, E-ISSN 1752-0509, Vol. 5, no 126, 15- p.Article in journal (Refereed)

Background: Proteins in organisms, rather than act alone, usually form protein complexes to perform cellular functions. We analyze the topological network structure of protein complexes and their component proteins in the budding yeast in terms of the bipartite network and its projections, where the complexes and proteins are its two distinct components. Compared to conventional protein-protein interaction networks, the networks from the protein complexes show more homogeneous structures than those of the binary protein interactions, implying the formation of complexes that cause a relatively more uniform number of interaction partners. In addition, we suggest a new optimization method to determine the abundance and function of protein complexes, based on the information of their global organization. Estimating abundance and biological functions is of great importance for many researches, by providing a quantitative description of cell behaviors, instead of just a "catalogues" of the lists of protein interactions.

Results: With our new optimization method, we present genome-wide assignments of abundance and biological functions for complexes, as well as previously unknown abundance and functions of proteins, which can provide significant information for further investigations in proteomics. It is strongly supported by a number of biologically relevant examples, such as the relationship between the cytoskeleton proteins and signal transduction and the metabolic enzyme Eno2's involvement in the cell division process.

Conclusions: We believe that our methods and findings are applicable not only to the specific area of proteomics, but also to much broader areas of systems biology with the concept of optimization principle.

• 16. Liu, Mengling
Umeå University, Faculty of Medicine, Department of Public Health and Clinical Medicine, Nutritional Research. Umeå University, Faculty of Medicine, Department of Biobank Research.
Estimation and selection of complex covariate effects in pooled nested case-control studies with heterogeneity2013In: Biostatistics, ISSN 1465-4644, E-ISSN 1468-4357, Vol. 14, no 4, 682-694 p.Article in journal (Refereed)

A major challenge in cancer epidemiologic studies, especially those of rare cancers, is observing enough cases. To address this, researchers often join forces by bringing multiple studies together to achieve large sample sizes, allowing for increased power in hypothesis testing, and improved efficiency in effect estimation. Combining studies, however, renders the analysis difficult owing to the presence of heterogeneity in the pooled data. In this article, motivated by a collaborative nested case-control (NCC) study of ovarian cancer in three cohorts from United States, Sweden, and Italy, we investigate the use of penalty regularized partial likelihood estimation in the context of pooled NCC studies to achieve two goals. First, we propose an adaptive group lasso (gLASSO) penalized approach to simultaneously identify important variables and estimate their effects. Second, we propose a composite agLASSO penalized approach to identify variables with heterogeneous effects. Both methods are readily implemented with the group coordinate gradient decent algorithm and shown to enjoy the oracle property. We conduct simulation studies to evaluate the performance of our proposed approaches in finite samples under various heterogeneity settings, and apply them to the pooled ovarian cancer study.

• 17.
Umeå University, Faculty of Science and Technology, Department of Chemistry.
A multivariate approach to computational molecular biology2005Doctoral thesis, comprehensive summary (Other academic)

This thesis describes the application of multivariate methods in analyses of genomic DNA sequences, gene expression and protein synthesis, which represent each of the steps in the central dogma of biology. The recent finalisation of large sequencing projects has given us a definable core of genetic data and large-scale methods for the dynamic quantification of gene expression and protein synthesis. However, in order to gain meaningful knowledge from such data, appropriate data analysis methods must be applied.

The multivariate projection methods, principal component analysis (PCA) and partial least squares projection to latent structures (PLS), were used for clustering and multivariate calibration of data. By combining results from these and other statistical methods with interactive visualisation, valuable information was extracted and further interpreted.

We analysed genomic sequences by combining multivariate statistics with cytological observations and full genome annotations. All oligomers of di- (16), tri- (64), tetra- (256), penta- (1024) and hexa-mers (4096) of DNA were separately counted and normalised and their distributions in the chromosomes of three Drosophila genomes were studied by using PCA. Using this strategy sequence signatures responsible for the differentiation of chromosomal elements were identified and related to previously defined biological features. We also developed a tool, which has been made publicly available, to interactively analyse single nucleotide polymorphism data and to visualise annotations and linkage disequilibrium.

PLS was used to investigate the relationships between weather factors and gene expression in field-grown aspen leaves. By interpreting PLS models it was possible to predict if genes were mainly environmentally or developmentally regulated. Based on a PCA model calculated from seasonal gene expression profiles, different phases of the growing season were identified as different clusters. In addition, a publicly available dataset with gene expression values for 7070 genes was analysed by PLS to classify tumour types. All samples in a training set and an external test set were correctly classified. For the interpretation of these results a method was applied to obtain a cut-off value for deciding which genes could be of interest for further studies.

Potential biomarkers for the efficacy of radiation treatment of brain tumours were identified by combining quantification of protein profiles by SELDI-MS-TOF with multivariate analysis using PCA and PLS. We were also able to differentiate brain tumours from normal brain tissue based on protein profiles, and observed that radiation treatment slows down the development of tumours at a molecular level.

By applying a multivariate approach for the analysis of biological data information was extracted that would be impossible or very difficult to acquire with traditional methods. The next step in a systems biology approach will be to perform a combined analysis in order to elucidate how the different levels of information are linked together to form a regulatory network.

• 18.
Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics. Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.
Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
Evaluating hospital performance based on excess cause-specific incidence2015In: Statistics in Medicine, ISSN 0277-6715, E-ISSN 1097-0258, Vol. 34, no 8, 1334-1350 p.Article in journal (Refereed)

Formal evaluation of hospital performance in specific types of care is becoming an indispensable tool for quality assurance in the health care system. When the prime concern lies in reducing the risk of a cause-specific event, we propose to evaluate performance in terms of an average excess cumulative incidence, referring to the center's observed patient mix. Its intuitive interpretation helps give meaning to the evaluation results and facilitates the determination of important benchmarks for hospital performance. We apply it to the evaluation of cerebrovascular deaths after stroke in Swedish stroke centers, using data from Riksstroke, the Swedish stroke registry.

• 19.
Umeå University, Faculty of Science and Technology, Department of Plant Physiology. Umeå University, Faculty of Science and Technology, Umeå Plant Science Centre (UPSC).
Umeå University, Faculty of Science and Technology, Department of Plant Physiology. Umeå University, Faculty of Science and Technology, Umeå Plant Science Centre (UPSC). Department of Forest Genetics and Plant Physiology, Umeå Plant Science Centre, Swedish University of Agricultural Sciences, Umeå, Sweden.
Training in High-Throughput Sequencing: Common Guidelines to Enable Material Sharing, Dissemination, and Reusability2016In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 12, no 6, e1004937Article in journal (Refereed)

The advancement of high-throughput sequencing (HTS) technologies and the rapid development of numerous analysis algorithms and pipelines in this field has resulted in an unprecedentedly high demand for training scientists in HTS data analysis. Embarking on developing new training materials is challenging for many reasons. Trainers often do not have prior experience in preparing or delivering such materials and struggle to keep them up to date. A repository of curated HTS training materials would support trainers in materials preparation, reduce the duplication of effort by increasing the usage of existing materials, and allow for the sharing of teaching experience among the HTS trainers' community. To achieve this, we have developed a strategy for materials' curation and dissemination. Standards for describing training materials have been proposed and applied to the curation of existing materials. A Git repository has been set up for sharing annotated materials that can now be reused, modified, or incorporated into new courses. This repository uses Git; hence, it is decentralized and self-managed by the community and can be forked/built-upon by all users. The repository is accessible at http://bioinformatics.upsc.se/htmr.

• 20.
Umeå University, Faculty of Science and Technology, Plant Physiology.
Populus transcriptomics: from noise to biology2007Doctoral thesis, comprehensive summary (Other academic)

DNA microarray analysis today is not just generation of high-throughput data, much more attention is paid to the subsequent efficient handling of the generated information. In this thesis, a pipeline to generate, store and analyse Populus transcriptional data is presented A public Populus microarray database - UPSC--BASE - was developed to gather and store transcriptomic data. In addition, several tools were provided to facilitate microarray analysis without requirements for expert-level knowledge. The aim has been to streamline the workflow from raw data through to biological interpretation.

Differentiating noise from valuable biological information is one of the challenges in DNA microarray analysis. Studying gene regulation in free-growing aspen trees represents a complex analysis scenario as the trees are exposed to, and interacting with, the environment to a much higher extent than under highly controlled conditions in the greenhouse. This work shows that, by using multivariate statistics and experimental planning, it is possible to follow and compare gene expression in leaves from multiple growing seasons, and draw valuable conclusions about gene expression from field-grown samples.

The biological information in UPSC-BASE is intended to be a valuable transcriptomic resource also for the wider plant community. The database provides information from almost a hundred different experiments, spanning different developmental stages, tissue types, abiotic and biotic stresses and mutants. The information can potentially be used for both cross-experiment analysis and for comparisons against other plants, such as Arabidopsis or rice. As a demonstration of this, microarray experiments performed on Populus leaves were merged and genes preferentially expressed in leaves were organised in to regulons of co-regulated genes. Those regulons were used to define genes of importance in leaf development in Populus. Taken together, the work presented in this thesis provides tools and knowledge for large-scale transcriptional studies and the stored gene expression information has been proven to be a valuable information resource for in-depth studies about gene regulation.

• 21.
Umeå University, Faculty of Science and Technology, Department of Chemistry.
Umeå University, Faculty of Medicine, Department of Molecular Biology (Faculty of Medicine). Umeå University, Faculty of Medicine, Department of Molecular Biology (Faculty of Medicine). Umeå University, Faculty of Medicine, Department of Molecular Biology (Faculty of Medicine). Umeå University, Faculty of Medicine, Department of Molecular Biology (Faculty of Medicine). Umeå University, Faculty of Medicine, Umeå Centre for Microbial Research (UCMR). Umeå University, Faculty of Medicine, Molecular Infection Medicine Sweden (MIMS). Umeå University, Faculty of Science and Technology, Department of Chemistry. Umeå University, Faculty of Medicine, Molecular Infection Medicine Sweden (MIMS). Umeå University, Faculty of Medicine, Umeå Centre for Microbial Research (UCMR). Umeå University, Faculty of Medicine, Department of Clinical Microbiology, Infectious Diseases. nfectious Diseases Institute, School of Medicine and Health Sciences, Makerere University, Uganda.
Metabolic signature profiling as a diagnostic and prognostic tool in paediatric Plasmodium falciparum malaria2015In: Open Forum Infectious Diseases, ISSN 2328-8957, Vol. 2, no 2Article in journal (Refereed)

Background: Accuracy in malaria diagnosis and staging is vital in order to reduce mortality and post infectious sequelae. Herein we present a metabolomics approach to diagnostic staging of malaria infection, specifically Plasmodium falciparum infection in children. Methods: A group of 421 patients between six months and six years of age with mild and severe states of malaria with age-matched controls were included in the study, 107, 192 and 122 individuals respectively. A multivariate design was used as basis for representative selection of twenty patients in each category. Patient plasma was subjected to Gas Chromatography-Mass Spectrometry analysis and a full metabolite profile was produced from each patient. In addition, a proof-of-concept model was tested in a Plasmodium berghei in-vivo model where metabolic profiles were discernible over time of infection. Results: A two-component principal component analysis (PCA) revealed that the patients could be separated into disease categories according to metabolite profiles, independently of any clinical information. Furthermore, two sub-groups could be identified in the mild malaria cohort who we believe represent patients with divergent prognoses. Conclusion: Metabolite signature profiling could be used both for decision support in disease staging and prognostication.

• 22. Svensson, Lennart
Umeå University, Faculty of Medicine, Department of Molecular Biology (Faculty of Medicine).
ProViz: a tool for explorative 3-D visualization and template matching in electron tomograms2017In: Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, ISSN 2168-1163, Vol. 5, no 6, 446-454 p.Article in journal (Refereed)

Visual understanding is a key aspect when studying electron tomography data-sets, aside quantitative assessments such as registration of high-resolution structures. We here present the free software tool ProViz (Protein Visualization) for visualisation and template matching in electron tomograms of biological samples. The ProViz software contains methods and tools which we have developed, adapted and computationally optimised for easy and intuitive visualisation and analysis of electron tomograms with low signal-to-noise ratio. ProViz complements existing software in the application field and serves as an easy and convenient tool for a first assessment and screening of the tomograms. It provides enhancements in three areas: (1) improved visualisation that makes connections as well as intensity differences between and within objects or structures easier to see and interpret, (2) interactive transfer function editing with direct visual result feedback using both piecewise linear functions and Gaussian function elements, (3) computationally optimised template matching and tools to visually assess and interactively explore the correlation results. The visualisation capabilities and features of ProViz are demonstrated on various biological volume data-sets: bacterial filament structures in vitro, a desmosome and the transmembrane cadherin connections therein in situ, and liposomes filled with doxorubicin in solution. The explorative template matching is demonstrated on a synthetic IgG data-set.

• 23.
Umeå University, Faculty of Science and Technology, Department of Computing Science.
Structural Information and Hidden Markov Models for Biological Sequence Analysis2008Doctoral thesis, comprehensive summary (Other academic)

Bioinformatics is a fast-developing field, which makes use of computational methods to analyse and structure biological data. An important branch of bioinformatics is structure and function prediction of proteins, which is often based on finding relationships to already characterized proteins. It is known that two proteins with very similar sequences also share the same 3D structure. However, there are many proteins with similar structures that have no clear sequence similarity, which make it difficult to find these relationships.

In this thesis, two methods for annotating protein domains are presented, one aiming at assigning the correct domain family or families to a protein sequence, and the other aiming at fold recognition. Both methods use hidden Markov models (HMMs) to find related proteins, and they both exploit the fact that structure is more conserved than sequence, but in two different ways.

Most of the research presented in the thesis focuses on the structure-anchored HMMs, saHMMs. For each domain family, an saHMM is constructed from a multiple structure alignment of carefully selected representative domains, the saHMM-members. These saHMM-members are collected in the so called "midnight ASTRAL set", and are chosen so that all saHMM-members within the same family have mutual sequence identities below a threshold of about 20%.

In order to construct the midnight ASTRAL set and the saHMMs, a pipe-line of software tools are developed. The saHMMs are shown to be able to detect the correct family relationships at very high accuracy,

and perform better than the standard tool Pfam in assigning the correct domain families to new domain sequences. We also introduce the FI-score, which is used to measure the performance of the saHMMs, in order to select the optimal model for each domain family.

The saHMMs are made available for searching through the FISH server, and can be used for assigning family relationships to protein sequences.

The other approach presented in the thesis is secondary structure HMMs (ssHMMs).

These HMMs are designed to use both the sequence and the predicted secondary structure of a query protein when scoring it against the model.

A rigorous benchmark is used, which shows that HMMs made from multiple sequences result in better fold recognition than those based on single sequences. Adding secondary structure information to the HMMs improves the ability of fold recognition further, both when using true and predicted secondary structures for the query sequence.

• 24.
Department of Archaeology, University of Sheffield, UK.
Umeå University, Faculty of Arts, Department of historical, philosophical and religious studies, Environmental Archaeology Lab.
Predicting island beetle faunas by their climate ranges: the tabula rasa/refugia theory in the North Atlantic2015In: Journal of Biogeography, ISSN 0305-0270, E-ISSN 1365-2699, Vol. 42, no 11, 2031-2048 p.Article in journal (Refereed)

Aim: This paper addresses two opposing theories put forward for the origins of the beetle fauna of the North Atlantic islands. The first is that the biota of the isolated oceanic islands of the Faroes, Iceland and Greenland immigrated across a Palaeogene–Neogene land bridge from Europe, and survived Pleistocene glaciations in ameliorated refugia. The second argues for a tabula rasa in which the biota of the islands was exterminated during glaciations and is Holocene in origin. The crux of these theories lies in the ability of the flora and fauna to survive in a range of environmental extremes. This paper sets out to assess the viability of the refugia hypothesis using the climatic tolerances of one aspect of the biota: the beetle fauna. Location: The paper focuses on Iceland, the Faroe Islands and Greenland. Methods: The known temperature requirements of the recorded beetle faunas of the North Atlantic islands were compared with published proxy climate reconstructions for successive climate periods since the severing of a North Atlantic land bridge. We used the MCR (mutual climatic range) method available in the open access BugsCEP database software. Results: We show that most of the MCR faunas of the North Atlantic islands could not have survived in situ since the Palaeogene–Neogene, and are likely to have been exterminated by the Pleistocene glaciations. Main conclusions: The discrepancy between the climatic tolerances of the North Atlantic beetle fauna and the estimated climatic regimes since the severing of a land bridge strongly support the tabula rasa theory and suggests that the North Atlantic coleopteran fauna is Holocene in origin.

• 25. Wallert, John
Umeå University, Faculty of Social Sciences, Department of Psychology.
Predicting two-year survival versus non-survival after first myocardial infarction using machine learning and Swedish national register data2017In: BMC Medical Informatics and Decision Making, ISSN 1472-6947, E-ISSN 1472-6947, Vol. 17, 99Article in journal (Refereed)

Background: Machine learning algorithms hold potential for improved prediction of all-cause mortality in cardiovascular patients, yet have not previously been developed with high-quality population data. This study compared four popular machine learning algorithms trained on unselected, nation-wide population data from Sweden to solve the binary classification problem of predicting survival versus non-survival 2 years after first myocardial infarction (MI).

Methods: This prospective national registry study for prognostic accuracy validation of predictive models used data from 51,943 complete first MI cases as registered during 6 years (2006-2011) in the national quality register SWEDEHEART/RIKS-HIA (90% coverage of all MIs in Sweden) with follow-up in the Cause of Death register (> 99% coverage). Primary outcome was AUROC (C-statistic) performance of each model on the untouched test set (40% of cases) after model development on the training set (60% of cases) with the full (39) predictor set. Model AUROCs were bootstrapped and compared, correcting the P-values for multiple comparisons with the Bonferroni method. Secondary outcomes were derived when varying sample size (1-100% of total) and predictor sets (39, 10, and 5) for each model. Analyses were repeated on 79,869 completed cases after multivariable imputation of predictors.

Results: A Support Vector Machine with a radial basis kernel developed on 39 predictors had the highest complete cases performance on the test set (AUROC = 0.845, PPV = 0.280, NPV = 0.966) outperforming Boosted C5.0 (0.845 vs. 0. 841, P = 0.028) but not significantly higher than Logistic Regression or Random Forest. Models converged to the point of algorithm indifference with increased sample size and predictors. Using the top five predictors also produced good classifiers. Imputed analyses had slightly higher performance.

Conclusions: Improved mortality prediction at hospital discharge after first MI is important for identifying high-risk individuals eligible for intensified treatment and care. All models performed accurately and similarly and because of the superior national coverage, the best model can potentially be used to better differentiate new patients, allowing for improved targeting of limited resources. Future research should focus on further model development and investigate possibilities for implementation.

• 26.
Logistical Engineering University.
University of Missouri Kansas City. Logistical Engineering University. Umeå University, Faculty of Science and Technology, Department of Physics.
Pathogenesis of Axial Spondyloarthropathy in a Network Perspective2011In: IEEE Conference on Systems Biology (ISB) / [ed] Luonan Chen, Xiang-Sun Zhang, Ling-Yun Wu, Yong Wang, IEEE Publishing , 2011, 41-46 p.Conference paper (Refereed)

Complex chronic diseases are usually not caused by changes in a single causal gene but by an unbalanced regulating network resulting from the dysfunctions of multiple genes or their products. Therefore, network based systems approach can be helpful for the identification of candidate genes related to complex diseases and their relationships. The Axial spondyloarthropathy (SpA) is a group of chronic inflammatory joint diseases that mainly affects the spine and the sacroiliac joints, yet, the pathogenesis of SpA remains largely unknown. In this paper, we conducted a networked systems study on the pathogenesis of SpA. We integrated data related to SpA, from the OMIM database, proteomics and microarray experiments of SpA, to prioritize SpA candidate disease genes in the context of human protein interactome. Based on the top ranked SpA related genes, we constructed a PPI network and identified potential pathways associated with SpA. The PPI network and pathways reflect the well-known knowledge of SpA, i.e., immune mediated inflammation, as well as imbalanced bone modeling caused new bone formation and bone loss. This study may facilitate our understanding of the SpA pathogenesis from the perspective of network systems.

1 - 26 of 26
CiteExportLink to result list
Cite
Citation style
• apa
• ieee
• modern-language-association-8th-edition
• vancouver
• Other style
More styles
Language
• de-DE
• en-GB
• en-US
• fi-FI
• nn-NO
• nn-NB
• sv-SE
• Other locale
More languages
Output format
• html
• text
• asciidoc
• rtf