Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Comparative Analysis of Tree-Based Methods for Predictive Modeling
Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
2024 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
Trädbaserade metoder för regression och klassificering (Swedish)
Abstract [en]

This paper delves into the application and predictive ability of tree-based methods in machine learning, focusing on decision trees, Random Forests, C-Trees, and boosting methods for regression and classification tasks. With the increasing importance of data-driven decision- making, these methodologies have emerged as powerful tools for extracting insights from complex datasets in fields such as finance, healthcare, and marketing. Supervised learning, encompassing both regression and classification, forms the backbone of these methods. The aim of this paper is to probe into what method has the best predictive performance. 

Tree-based methods trace all the way back to 1963, evolving through significant contributions such as the Classification and Regression Trees (CART) framework along with bagging and Random Forests. These advancements have enhanced the robustness and accuracy of predictive models. Modern boosting techniques, like AdaBoost and Gradient Boosting, iteratively improve model performance by focusing on previously misclassified instances. 

Methods like linear and logistic regression are explored and used as baseline models for their simplicity and interpretability. The paper also investigates more advanced techniques such as Support Vector Machines and Artificial Neural Networks. Additionally, the Conditional Inference Trees (C-Tree) approach offers a hypothesis-testing framework for optimizing splits, enhancing model accuracy. 

The analysis includes fitting various models to training data, cross-validation for optimal parameter selection, and performance evaluation on validation datasets. Results from the analysis indicate that Random Forest and XGBoost were strong performers in both the classification and regression problems. 

Place, publisher, year, edition, pages
2024.
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:umu:diva-226745OAI: oai:DiVA.org:umu-226745DiVA, id: diva2:1874423
Subject / course
Statistics C: Bachelor's Thesis
Educational program
Programme in Statistics and Data Science
Available from: 2024-06-20 Created: 2024-06-20 Last updated: 2024-06-20Bibliographically approved

Open Access in DiVA

fulltext(1654 kB)584 downloads
File information
File name FULLTEXT01.pdfFile size 1654 kBChecksum SHA-512
548c1229c169e8366a8902f1f6357b4d75d10fd79db87ea6dcc1078140a5f11030454a0c15b0c9f6952f75dc86ea479fef24695fd9cb7a58c06ec92648e2e743
Type fulltextMimetype application/pdf

By organisation
Statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 584 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 589 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf