Background: Pancreatic ductal adenocarcinoma (PDAC) is a very aggressive malignancy with a 5-year survival of 10 %. Surgery is the only curative treatment. Unfortunately, few patients are eligible for surgery due to late detection. Thus, we need ways to detect the disease at an earlier stage and for that good screening biomarkers could be used. Previous studies have analyzed circulating analytes in prospective studies to identify early PDAC signals. One such class is microRNAs (miRNAs). MicroRNAs are non-coding RNAs of around 22 nucleotides that act as post- transcriptional regulators by interaction with messenger RNAs (mRNAs). The function of a miRNA can be elucidated by target prediction, to identify its potential targets, followed by enrichment analysis of the predicted targets. Challenges with this approach includes a lot of false positives being generated and that miRNAs can perform their role in a tissue- or disease-specific manner. Other classes of analytes that have previously been studied in prospective PDAC cohorts are metabolites and proteins.
Aims: This thesis has three aims. First, to build a miRNA functional analysis pipeline with correlation support between miRNA and its predicted target genes. Second, to identify potential circulating biomarkers for early detection of PDAC using multi-omics. Third, to identify potential prognostic metabolites in a prospective PDAC cohort.
Methods: We used publicly available data from the cancer genome atlas-pancreatic adenocarcinoma (TCGA-PAAD) and pre-diagnostic plasma samples from the Northern Sweden Health and Disease Study. We built a pipeline in R including miRNA, mRNA, and protein expression data from TCGA-PAAD for in silico miRNA functional analysis. Pre- diagnostic plasma samples from future PDAC patients as well as matched healthy controls were analyzed using multi- omics. Tissue polypeptide specific antigen (TPS) was analyzed by enzyme linked immunosorbent assay in 267 future PDAC samples and 320 healthy controls. Metabolomics and clinical biomarkers (carbohydrate antigen (CA) 19-9, carcinoembryonic antigen (CEA), and CA 15-3) were profiled in 100 future PDAC samples and 100 healthy controls using liquid chromatography-mass spectrometry (MS), gas chromatography-MS, and multi-plex technology. Of these, a subset of 39 future PDAC patients and 39 healthy controls were profiled for 2083 microRNAs using targeted sequencing and 644 proteins using proximity extension assays. Circulating levels of multi-omics analytes were analyzed using conditional or unconditional logistic regression. Least absolute shrinkage and selection operator (LASSO) in combination with 500 bootstrap iterations identified the most informative variables. The prognostic value of metabolites was assessed using cox regression. Multi-omics factor analysis (MOFA) and data integration analysis for biomarker discovery using latent components (DIABLO) were used for multi-omics integration analyses.
Results: An automated pipeline was built consisting of 1) miRNA target prediction, 2) correlation analyses between miRNA and its targets on mRNA and protein expression levels, and 3) functional enrichment of correlated targets to identify enriched Kyoto encyclopedia of genes and genomes (KEGG) pathways and gene ontology (GO) terms for a specific miRNA. The pipeline was run for all microRNAs (~700) detected in the TCGA-PAAD cohort. These results can be downloaded from a shiny app (https://emmbor.shinyapps.io/mirfa/). TPS was not altered in pre-diagnostic PDAC patients up to 24 years prior to diagnosis, but increased at diagnosis (OR = 1.03, 95 % CI: 1.01-1.05). Internal area under curves of 0.74, 0.80, and 0.88 were achieved for five metabolites, two proteins, and two miRNAs that were selected by LASSO and bootstrap iterations, in combination with CA 19-9. Neither MOFA nor DIABLO separated well between future PDAC cases and healthy controls.
Conclusions: Our bioinformatics pipeline for in silico functional analysis of microRNAs successfully identifies enriched KEGG pathways and GO terms for miRNA isoforms. The investigated plasma samples are heterogeneous, but among the analyzed variables, we identified five metabolites, two proteins, and two microRNAs with highest potential for early PDAC detection. CA 19-9 levels increased closer to diagnosis. We identified five fatty acids that could be studied in a diagnostic PDAC cohort as prognostic biomarkers.