Umeå University's logo

umu.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 207) Show all publications
Forsgren, E., Björkblom, B., Trygg, J. & Jonsson, P. (2025). OPLS-based multiclass classification and data-driven interclass relationship discovery. Journal of Chemical Information and Modeling, 65(4), 1762-1770
Open this publication in new window or tab >>OPLS-based multiclass classification and data-driven interclass relationship discovery
2025 (English)In: Journal of Chemical Information and Modeling, ISSN 1549-9596, E-ISSN 1549-960X, Vol. 65, no 4, p. 1762-1770Article in journal (Refereed) Published
Abstract [en]

Multiclass data sets and large-scale studies are increasingly common in omics sciences, drug discovery, and clinical research due to advancements in analytical platforms. Efficiently handling these data sets and discerning subtle differences across multiple classes remains a significant challenge. In metabolomics, two-class orthogonal projection to latent structures discriminant analysis (OPLS-DA) models are widely used due to their strong discrimination capabilities and ability to provide interpretable information on class differences. However, these models face challenges in multiclass settings. A common solution is to transform the multiclass comparison into multiple two-class comparisons, which, while more effective than a global multiclass OPLS-DA model, unfortunately results in a manual, time-consuming model-building process with complicated interpretation. Here, we introduce an extension of OPLS-DA for data-driven multiclass classification: orthogonal partial least squares-hierarchical discriminant analysis (OPLS-HDA). OPLS-HDA integrates hierarchical cluster analysis (HCA) with the OPLS-DA framework to create a decision tree, addressing multiclass classification challenges and providing intuitive visualization of interclass relationships. To avoid overfitting and ensure reliable predictions, we use cross-validation during model building. Benchmark results show that OPLS-HDA performs competitively across diverse data sets compared to eight established methods. This method represents a significant advancement, offering a powerful tool to dissect complex multiclass data sets. With its versatility, interpretability, and ease of use, OPLS-HDA is an efficient approach to multiclass data analysis applicable across various fields.

Place, publisher, year, edition, pages
American Chemical Society (ACS), 2025
Keywords
Cluster Analysis, Discriminant Analysis, Humans, Least-Squares Analysis, Metabolomics
National Category
Bioinformatics and Computational Biology
Identifiers
urn:nbn:se:umu:diva-236203 (URN)10.1021/acs.jcim.4c01799 (DOI)001412188800001 ()39899705 (PubMedID)2-s2.0-85216849215 (Scopus ID)
Available from: 2025-03-13 Created: 2025-03-13 Last updated: 2025-03-19Bibliographically approved
Yakovenko, I., Mihai, I. S., Selinger, M., Rosenbaum, W., Dernstedt, A., Gröning, R., . . . Henriksson, J. (2025). Telomemore enables single-cell analysis of cell cycle and chromatin condensation. Nucleic Acids Research, 53(3), Article ID gkaf031.
Open this publication in new window or tab >>Telomemore enables single-cell analysis of cell cycle and chromatin condensation
Show others...
2025 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 53, no 3, article id gkaf031Article in journal (Refereed) Published
Abstract [en]

Single-cell RNA-seq methods can be used to delineate cell types and states at unprecedented resolution but do little to explain why certain genes are expressed. Single-cell ATAC-seq and multiome (ATAC + RNA) have emerged to give a complementary view of the cell state. It is however unclear what additional information can be extracted from ATAC-seq data besides transcription factor binding sites. Here, we show that ATAC-seq telomere-like reads counter-inituively cannot be used to infer telomere length, as they mostly originate from the subtelomere, but can be used as a biomarker for chromatin condensation. Using long-read sequencing, we further show that modern hyperactive Tn5 does not duplicate 9 bp of its target sequence, contrary to common belief. We provide a new tool, Telomemore, which can quantify nonaligning subtelomeric reads. By analyzing several public datasets and generating new multiome fibroblast and B-cell atlases, we show how this new readout can aid single-cell data interpretation. We show how drivers of condensation processes can be inferred, and how it complements common RNA-seq-based cell cycle inference, which fails for monocytes. Telomemore-based analysis of the condensation state is thus a valuable complement to the single-cell analysis toolbox.

Place, publisher, year, edition, pages
Oxford University Press, 2025
National Category
Molecular Biology Medical Genetics and Genomics Medical Bioinformatics and Systems Biology
Identifiers
urn:nbn:se:umu:diva-235667 (URN)10.1093/nar/gkaf031 (DOI)001408073800005 ()39878215 (PubMedID)2-s2.0-85216776275 (Scopus ID)
Funder
Swedish National Infrastructure for Computing (SNIC)Swedish Research Council, 2021-06602Swedish Research Council, 2024-03952Swedish Cancer Society, 233102 PjThe Kempe Foundations, JCK-0055The Kempe Foundations, SMK-1959Knut and Alice Wallenberg Foundation, KAW 2020.0239
Available from: 2025-02-24 Created: 2025-02-24 Last updated: 2025-02-24Bibliographically approved
Eriksson, A., Richelle, A., Trygg, J., Scholze, S., Pijeaud, S., Antti, H., . . . Jonsson, P. (2025). Time-resolved hierarchical modeling highlights metabolites influencing productivity and cell death in Chinese hamster ovary cells. Biotechnology Journal, 20(3), Article ID e202400624.
Open this publication in new window or tab >>Time-resolved hierarchical modeling highlights metabolites influencing productivity and cell death in Chinese hamster ovary cells
Show others...
2025 (English)In: Biotechnology Journal, ISSN 1860-6768, E-ISSN 1860-7314, Vol. 20, no 3, article id e202400624Article in journal (Refereed) Published
Abstract [en]

Biopharmaceuticals are medical compounds derived from biological sources and are often manufactured by living cells, primarily Chinese hamster ovary (CHO) cells. CHO cells display variation among cell clones, leading to growth and productivity differences that influence the product's quantity and quality. The biological and environmental factors behind these differences are not fully understood. To identify metabolites with a consistent relationship to productivity or cell death over time, we analyzed the extracellular metabolome of 11 CHO clones with different growth and productivity characteristics over 14 days. However, in bioreactor processes, metabolic profiles and process variables are both strongly time-dependent, confounding the metabolite-process variable relationship. To address this, we customized an existing hierarchical approach for handling time dependency to highlight metabolites with a consistent correlation to a process variable over a selected timeframe. We benchmarked this new method against conventional orthogonal partial least squares (OPLS) models. Our hierarchical method highlighted several metabolites consistently related to productivity or cell death that the conventional method missed. These metabolites were biologically relevant; most were known already, but some that had not been reported in CHO literature before, such as 3-methoxytyrosine and succinyladenosine, had ties to cell death in studies with other cell types. The metabolites showed an inverse relationship with the response variables: those positively correlated with productivity were typically negatively correlated with the death rate, or vice versa. For both productivity and cell death, the citrate cycle and adjacent pathways (pyruvate, glyoxylate, pantothenate) were among the most important. In summary, we have proposed a new method to analyze time-dependent omics data in bioprocess production. This approach allowed us to identify metabolites tied to cell death and productivity that were not detected with traditional models.

Place, publisher, year, edition, pages
Wiley-VCH Verlagsgesellschaft, 2025
Keywords
bioprocess data, Chinese hamster ovary (CHO) cells, death rate, hierarchical modeling, metabolomics, orthogonal partial least squares (OPLS), productivity
National Category
Medical Biotechnology (Focus on Cell Biology, (incl. Stem Cell Biology), Molecular Biology, Microbiology, Biochemistry or Biopharmacy)
Identifiers
urn:nbn:se:umu:diva-237157 (URN)10.1002/biot.202400624 (DOI)001441224200001 ()40065671 (PubMedID)2-s2.0-105000082543 (Scopus ID)
Available from: 2025-04-14 Created: 2025-04-14 Last updated: 2025-04-14Bibliographically approved
Wang, D., Jiang, L., Kjellander, M., Weidemann, E., Trygg, J. & Tysklind, M. (2024). A novel data mining framework to investigate causes of boiler failures in waste-to-energy plants. Processes, 12(7), Article ID 1346.
Open this publication in new window or tab >>A novel data mining framework to investigate causes of boiler failures in waste-to-energy plants
Show others...
2024 (English)In: Processes, ISSN 2227-9717, Vol. 12, no 7, article id 1346Article in journal (Refereed) Published
Abstract [en]

Examining boiler failure causes is crucial for thermal power plant safety and profitability. However, traditional approaches are complex and expensive, lacking precise operational insights. Although data-driven approaches hold substantial potential in addressing these challenges, there is a gap in systematic approaches for investigating failure root causes with unlabeled data. Therefore, we proffered a novel framework rooted in data mining methodologies to probe the accountable operational variables for boiler failures. The primary objective was to furnish precise guidance for future operations to proactively prevent similar failures. The framework was centered on two data mining approaches, Principal Component Analysis (PCA) + K-means and Deep Embedded Clustering (DEC), with PCA + K-means serving as the baseline against which the performance of DEC was evaluated. To demonstrate the framework’s specifics, a case study was performed using datasets obtained from a waste-to-energy plant in Sweden. The results showed the following: (1) The clustering outcomes of DEC consistently surpass those of PCA + K-means across nearly every dimension. (2) The operational temperature variables T-BSH3rm, T-BSH2l, T-BSH3r, T-BSH1l, T-SbSH3, and T-BSH1r emerged as the most significant contributors to the failures. It is advisable to maintain the operational levels of T-BSH3rm, T-BSH2l, T-BSH3r, T-BSH1l, T-SbSH3, and T-BSH1r around 527 °C, 432 °C, 482 °C, 338 °C, 313 °C, and 343 °C respectively. Moreover, it is crucial to prevent these values from reaching or exceeding 594 °C, 471 °C, 537 °C, 355 °C, 340 °C, and 359 °C for prolonged durations. The findings offer the opportunity to improve future operational conditions, thereby extending the overall service life of the boiler. Consequently, operators can address faulty tubes during scheduled annual maintenance without encountering failures and disrupting production.

Place, publisher, year, edition, pages
MDPI, 2024
Keywords
data mining, deep embedded clustering, failure analysis, power plants
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-228513 (URN)10.3390/pr12071346 (DOI)001277572100001 ()2-s2.0-85199646373 (Scopus ID)
Available from: 2024-08-19 Created: 2024-08-19 Last updated: 2024-08-19Bibliographically approved
Forsgren, E., Cloarec, O., Jonsson, P., Lovell, G. & Trygg, J. (2024). A scalable, data analytics workflow for image-based morphological profiles. Chemometrics and Intelligent Laboratory Systems, 254, Article ID 105232.
Open this publication in new window or tab >>A scalable, data analytics workflow for image-based morphological profiles
Show others...
2024 (English)In: Chemometrics and Intelligent Laboratory Systems, ISSN 0169-7439, E-ISSN 1873-3239, Vol. 254, article id 105232Article in journal (Refereed) Published
Abstract [en]

Cell Painting is an established community-based microscopy-assay platform that provides high-throughput, high-content data for biological readouts. In November 2022, the JUMP-Cell Painting Consortium released the largest publicly available Cell Painting dataset with CellProfiler features, comprising more than 2 billion cell images. This dataset is designed for predicting the activity and toxicity of 115k drug compounds, with the aim to make cell images as computable as genomes and transcriptomes. In this context, our paper introduces a scalable and computationally efficient data analytics workflow created to meet the needs of researchers. This data-driven workflow facilitates the comparison of drug treatment effects through significant and biologically relevant insights. The workflow consists of two parts: first, the Equivalence score (Eq. score), a straightforward yet sophisticated metric highlighting relevant deviations from negative controls based on cell image morphology; second, the scalability of the workflow, by utilizing the Eq. scores on a large scale to predict and classify the subtle morphological changes in cell image profiles. By doing so, we show classification improvements compared to using the raw CellProfiler features on the CPJUMP1-pilot dataset on three types of perturbations. We hope that our workflow's contributions will enhance drug screening efficiency and streamline the drug development process. As this process is resource-intensive, every incremental improvement is valuable. Through our collective efforts in advancing the understanding of high-throughput image-based data, we aim to reduce both the time and cost of developing new, life-saving treatments.

Place, publisher, year, edition, pages
Elsevier, 2024
Keywords
Cell Painting, Chemometrics, Computational Workflow, Drug discovery, High-throughput Screening, Morphological Profiling, Quantitative Image Analysis
National Category
Bioinformatics (Computational Biology) Pharmacology and Toxicology
Identifiers
urn:nbn:se:umu:diva-230015 (URN)10.1016/j.chemolab.2024.105232 (DOI)2-s2.0-85204373412 (Scopus ID)
Funder
eSSENCE - An eScience Collaboration
Available from: 2024-10-02 Created: 2024-10-02 Last updated: 2025-03-19Bibliographically approved
Khalid, N., Caroprese, M., Lovell, G., Porto, D. A., Trygg, J., Dengel, A. & Ahmed, S. (2024). Bounding box is all you need: learning to segment cells in 2D microscopic images via box annotations. In: Moi Hoon Yap; Connah Kendrick; Ardhendu Behera; Timothy Cootes; Reyer Zwiggelaar (Ed.), Medical image understanding and analysis: 28th annual conference, MIUA 2024, Manchester, UK, July 24–26, 2024, proceedings, part I. Paper presented at 28th Annual Conference on Medical Image Understanding and Analysis, MIUA 2024, Manchester, UK, July 24-26, 2024 (pp. 314-328). Cham: Springer
Open this publication in new window or tab >>Bounding box is all you need: learning to segment cells in 2D microscopic images via box annotations
Show others...
2024 (English)In: Medical image understanding and analysis: 28th annual conference, MIUA 2024, Manchester, UK, July 24–26, 2024, proceedings, part I / [ed] Moi Hoon Yap; Connah Kendrick; Ardhendu Behera; Timothy Cootes; Reyer Zwiggelaar, Cham: Springer, 2024, p. 314-328Conference paper, Published paper (Refereed)
Abstract [en]

Microscopic imaging plays a pivotal role in various fields of science and medicine, offering invaluable insights into the intricate world of cellular biology. At the heart of this endeavor lies the need for accurate identification and characterization of individual cells within these images. Deep learning-based cell segmentation, which involves delineating cells from complex microscopic images, is pivotal for cell analysis. It serves as the foundation for extracting meaningful information about cell morphology, spatial organization, and interactions. However, traditional deep-learning models for cell segmentation require extensive and expensive annotation masks for each cell in the image, posing a significant challenge. To address this issue, this study introduces CellBoxify, a novel pipeline that streamlines cell instance segmentation. Unlike traditional methods, CellBoxify operates solely on bounding box annotations, making it approximately seven times faster than manual segmentation mask annotation for each cell. The proposed approach’s effectiveness is evident in its performance on the LIVECell dataset, a well-known resource for cell segmentation research. Achieving 83.40% of the fully supervised performance on this dataset demonstrates the efficacy of the proposed method.

Place, publisher, year, edition, pages
Cham: Springer, 2024
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349 ; 14859
Keywords
bounding box annotations, cell segmentation, deep learning, medical imaging, weakly supervised
National Category
Computer graphics and computer vision Medical Imaging
Identifiers
urn:nbn:se:umu:diva-228484 (URN)10.1007/978-3-031-66955-2_22 (DOI)2-s2.0-85200686935 (Scopus ID)9783031669545 (ISBN)9783031669552 (ISBN)
Conference
28th Annual Conference on Medical Image Understanding and Analysis, MIUA 2024, Manchester, UK, July 24-26, 2024
Available from: 2024-08-15 Created: 2024-08-15 Last updated: 2025-02-09Bibliographically approved
Khalid, N., Koochali, M., Leon, D. N., Caroprese, M., Lovell, G., Porto, D. A., . . . Ahmed, S. (2024). CellGenie: an end-to-end pipeline for synthetic cellular data generation and segmentation: a use case for cell segmentation in microscopic images. In: Moi Hoon Yap; Connah Kendrick; Ardhendu Behera; Timothy Cootes; Reyer Zwiggelaar (Ed.), Medical image understandingand analysis: 28th Annual Conference, MIUA 2024, Manchester, UK, July 24–26, 2024, Proceedings, part 1. Paper presented at 28th UK Conference on Medical Image Understanding and Analysis-MIUA, Manchester, UK, July 24–26, 2024 (pp. 387-401). Springer Nature
Open this publication in new window or tab >>CellGenie: an end-to-end pipeline for synthetic cellular data generation and segmentation: a use case for cell segmentation in microscopic images
Show others...
2024 (English)In: Medical image understandingand analysis: 28th Annual Conference, MIUA 2024, Manchester, UK, July 24–26, 2024, Proceedings, part 1 / [ed] Moi Hoon Yap; Connah Kendrick; Ardhendu Behera; Timothy Cootes; Reyer Zwiggelaar, Springer Nature, 2024, p. 387-401Conference paper, Published paper (Refereed)
Abstract [en]

Cellular imaging plays a pivotal role in understanding various biological processes and diseases, making accurate cell segmentation indispensable for many biomedical applications. However, traditional methods for cell segmentation often rely on manual annotation, which is labor-intensive and time-consuming. Deep learning-based approaches for cell segmentation have shown promising results, but they require a vast amount of annotated data for training. In this context, this study presents CellGenie, an end-to-end pipeline designed to address the challenge of data scarcity in deep learning-based cell segmentation. This research proposes an innovative approach for automatic synthetic data generation tailored for microscopic image analysis. Leveraging the rich information provided by the LIVECell dataset, CellGenie generates synthetic microscopic images along with their corresponding segmentation masks for individual cells. By seamlessly integrating this synthetic data into the training process, this study enhances the performance of cell segmentation models beyond the limitations of existing annotated dataset. Furthermore, extensive experimentations are conducted to evaluate the efficacy of the generated data across various experimental scenarios. The results demonstrate the substantial impact of synthetic data generation in improving the robustness and generalization of cell segmentation models.

Place, publisher, year, edition, pages
Springer Nature, 2024
Series
Lecture Notes in Computer Science , ISSN 03029743, E-ISSN 16113349 ; 14859
Keywords
cell segmentation, deep learning, microscopic imaging, synthetic data
National Category
Computer graphics and computer vision Medical Imaging
Identifiers
urn:nbn:se:umu:diva-228391 (URN)10.1007/978-3-031-66955-2_27 (DOI)2-s2.0-85200668724 (Scopus ID)9783031669545 (ISBN)9783031669552 (ISBN)
Conference
28th UK Conference on Medical Image Understanding and Analysis-MIUA, Manchester, UK, July 24–26, 2024
Note

Included in the following conference series:

MIUA: Annual Conference on Medical Image Understanding and Analysis

Available from: 2024-08-15 Created: 2024-08-15 Last updated: 2025-02-09Bibliographically approved
Khalid, N., Caroprese, M., Lovell, G., Trygg, J., Dengel, A. & Ahmed, S. (2024). CellSpot: deep learning-based efficient cell center detection in microscopic images. In: Artificial Neural Networks and Machine Learning – ICANN 2024: 33rd International Conference on Artificial Neural Networks, Lugano, Switzerland, September 17–20, 2024, Proceedings, Part VIII. Paper presented at 33rd International Conference on Artificial Neural Networks, ICANN 2024, Lugano, Sweitzerland, September 17-20, 2024 (pp. 215-229). Springer Nature
Open this publication in new window or tab >>CellSpot: deep learning-based efficient cell center detection in microscopic images
Show others...
2024 (English)In: Artificial Neural Networks and Machine Learning – ICANN 2024: 33rd International Conference on Artificial Neural Networks, Lugano, Switzerland, September 17–20, 2024, Proceedings, Part VIII, Springer Nature, 2024, p. 215-229Conference paper, Published paper (Refereed)
Abstract [en]

Cells play a fundamental role in sustaining life by performing numerous functions crucial for the survival of living organisms. The detection of cells holds paramount importance in the validation and analysis of biological hypotheses, as it offers valuable insights into the behavior, function, diagnosis, and treatment of diseases. By accurately detecting and studying cells, researchers can unravel the complexities of cellular processes, leading to advancements in understanding diseases and the development of effective therapeutic interventions. In the domain of microscopic image analysis, substantial efforts have been devoted to the quantification of cells through segmentation masks and bounding boxes. However, these methods are time-consuming and resource-intensive. To tackle this challenge, we’ve introduced a novel approach focused on cell detection using solely their centerpoints. The proposed pipeline drastically cuts down on annotation efforts while still delivering commendable performance. By leveraging the proposed method, we aim to enhance efficiency in cell detection, paving the way for more expedient and resource-effective analysis in biological research and medical diagnostics.

Place, publisher, year, edition, pages
Springer Nature, 2024
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349
Keywords
cell centroid, cell detection, deep learning, point annotation
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:umu:diva-230592 (URN)10.1007/978-3-031-72353-7_16 (DOI)2-s2.0-85205302583 (Scopus ID)978-3-031-72352-0 (ISBN)978-3-031-72353-7 (ISBN)
Conference
33rd International Conference on Artificial Neural Networks, ICANN 2024, Lugano, Sweitzerland, September 17-20, 2024
Note

Included in the following conference series: International Conference on Artificial Neural Networks

Available from: 2024-10-08 Created: 2024-10-08 Last updated: 2025-02-07Bibliographically approved
Machleid, R., Hoehse, M., Scholze, S., Mazarakis, K., Nilsson, D., Johansson, E., . . . Surowiec, I. (2024). Feasibility and performance of cross-clone Raman calibration models in CHO cultivation. Biotechnology Journal, 19(1), Article ID 2300289.
Open this publication in new window or tab >>Feasibility and performance of cross-clone Raman calibration models in CHO cultivation
Show others...
2024 (English)In: Biotechnology Journal, ISSN 1860-6768, E-ISSN 1860-7314, Vol. 19, no 1, article id 2300289Article in journal (Refereed) Published
Abstract [en]

Raman spectroscopy is widely used in monitoring and controlling cell cultivations for biopharmaceutical drug manufacturing. However, its implementation for culture monitoring in the cell line development stage has received little attention. Therefore, the impact of clonal differences, such as productivity and growth, on the prediction accuracy and transferability of Raman calibration models is not yet well described. Raman OPLS models were developed for predicting titer, glucose and lactate using eleven CHO clones from a single cell line. These clones exhibited diverse productivity and growth rates. The calibration models were evaluated for clone-related biases using clone-wise linear regression analysis on cross validated predictions. The results revealed that clonal differences did not affect the prediction of glucose and lactate, but titer models showed a significant clone-related bias, which remained even after applying variable selection methods. The bias was associated with clonal productivity and lead to increased prediction errors when titer models were transferred to cultivations with productivity levels outside the range of their training data. The findings demonstrate the feasibility of Raman-based monitoring of glucose and lactate in cell line development with high accuracy. However, accurate titer prediction requires careful consideration of clonal characteristics during model development.

Place, publisher, year, edition, pages
John Wiley & Sons, 2024
Keywords
bioprocess development, bioprocess engineering, bioprocess monitoring, CHO cells
National Category
Analytical Chemistry
Identifiers
urn:nbn:se:umu:diva-218135 (URN)10.1002/biot.202300289 (DOI)38015079 (PubMedID)2-s2.0-85178957570 (Scopus ID)
Available from: 2023-12-18 Created: 2023-12-18 Last updated: 2024-04-30Bibliographically approved
Abbasi, A. F., Asim, M. N., Trygg, J., Dengel, A. & Ahmed, S. (2023). Deep learning architectures for the prediction of YY1-mediated chromatin loops. In: Xuan Guo; Serghei Mangul; Murray Patterson; Alexander Zelikovsky (Ed.), Bioinformatics research and applications: 19th international symposium, ISBRA 2023, Wrocław, Poland, October 9–12, 2023, proceedings. Paper presented at 19th International Symposium on Bioinformatics Research and Applications, ISBRA 2023 (pp. 72-84). Springer
Open this publication in new window or tab >>Deep learning architectures for the prediction of YY1-mediated chromatin loops
Show others...
2023 (English)In: Bioinformatics research and applications: 19th international symposium, ISBRA 2023, Wrocław, Poland, October 9–12, 2023, proceedings / [ed] Xuan Guo; Serghei Mangul; Murray Patterson; Alexander Zelikovsky, Springer, 2023, p. 72-84Conference paper, Published paper (Refereed)
Abstract [en]

YY1-mediated chromatin loops play substantial roles in basic biological processes like gene regulation, cell differentiation, and DNA replication. YY1-mediated chromatin loop prediction is important to understand diverse types of biological processes which may lead to the development of new therapeutics for neurological disorders and cancers. Existing deep learning predictors are capable to predict YY1-mediated chromatin loops in two different cell lines however, they showed limited performance for the prediction of YY1-mediated loops in the same cell lines and suffer significant performance deterioration in cross cell line setting. To provide computational predictors capable of performing large-scale analyses of YY1-mediated loop prediction across multiple cell lines, this paper presents two novel deep learning predictors. The two proposed predictors make use of Word2vec, one hot encoding for sequence representation and long short-term memory, and a convolution neural network along with a gradient flow strategy similar to DenseNet architectures. Both of the predictors are evaluated on two different benchmark datasets of two cell lines HCT116 and K562. Overall the proposed predictors outperform existing DEEPYY1 predictor with an average maximum margin of 4.65%, 7.45% in terms of AUROC, and accuracy, across both of the datases over the independent test sets and 5.1%, 3.2% over 5-fold validation. In terms of cross-cell evaluation, the proposed predictors boast maximum performance enhancements of up to 9.5% and 27.1% in terms of AUROC over HCT116 and K562 datasets.

Place, publisher, year, edition, pages
Springer, 2023
Series
Lecture Notes in Computer Science (LNBI), ISSN 0302-9743, E-ISSN 1611-3349 ; 14248
Keywords
Chromatin loops, Convolutional Networks, Gene regulation, LSTM, One hot encoding, Word2vec, YY1
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:umu:diva-215831 (URN)10.1007/978-981-99-7074-2_6 (DOI)2-s2.0-85174274462 (Scopus ID)9789819970735 (ISBN)9789819970742 (ISBN)
Conference
19th International Symposium on Bioinformatics Research and Applications, ISBRA 2023
Available from: 2023-11-06 Created: 2023-11-06 Last updated: 2023-11-06Bibliographically approved
Projects
Dynamic modeling in Poplar using a Systems Biology approach [2008-03588_VR]; Umeå UniversityGlobal data integration in metabolomics and systems biology [2011-06044_VR]; Umeå UniversitySSC13 - 13th Scandinavian Symposium on Chemometrics 17-20 June, Stockholm, Sweden [2013-00219_VR]; Umeå UniversitySystems analysis of wine, from the Vineyard and beyond [2016-04376_VR]; Umeå University
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-3799-6094

Search in DiVA

Show all publications