umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Variable selection for the Cox proportional hazards model: A simulation study comparing the stepwise, lasso and bootstrap approach
Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics.
2017 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

In a regression setting with a number of measured covariates not all may be relevant to the response. By reducing the numbers of covariates included in the final model we could improve its prediction accurarcy as well as making it easier to interpret. In survival analysis, the study of time-to-event data, the most common form of regression is the semi-parametric Cox proportional hazard (PH) model. In this thesis we have compared three different ways to perform variable selection in the Cox PH model, stepwise regression, lasso and bootstrap. By simulating survival data we could control which covariates that were significant for the response. Fitting the Cox PH model to these data using the three different variable selection methods we could evaluate how well each method performs in finding the correct model. We found that while bootstrap in some cases could improve the stepwise approach its performance is strongly effected by the choice of inclusion frequency. Lasso performed equivalent or slightly better than the stepwise method for data with weak effects. However, when the data instead consists of strong effects, the performance of stepwise is considerably better than the performance of lasso.

Abstract [sv]

Vid regression söks sambandet mellan en beroende variabel och en eller flera förklarande variabler. Även om vi har tillgång till många förklarande variabler är det dock inte säkert att alla påverkar den beroende variabeln. Genom att minska antalet variabler som inkluderas i den slutgiltiga modellen kan man förbättra dess prediktionsförmåga samtidigt som den blir lättare att tolka. Inom överlevnadslys är en av de vanligaste regressionsmetoderna den semi-parametriska Cox proportional hazard (PH) model. I den här uppsatsen har vi jämfört tre olika metoder för variabel selektion i Cox PH model, stegvis regression, lasso och bootstrap. Genom att simulera överlevnadsdata kan vi styra vilka variabler som påverkar den beroende variabelen. Det blir då möjligt att utvärdera hur väl de olika metoderna lyckas med att inkludera dessa variabler i den slutgiltiga Cox PH model. Vi fann att bootstrap i vissa situationer gav bättre resultat än den stegvisa regressionen, dock varierar resultatet väldigt mycket beroende på valet av inklusionsfrekvens. Resultaten av lasso och stegvis regression är likvärdiga, eller till fördel för lasso, så länge datat innehåller svagare effekter. När datat istället består av starkare effekter ger dock den stegvisa regressionen mycket bättre resultat än lasso.  

Place, publisher, year, edition, pages
2017. , 50 p.
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:umu:diva-130521OAI: oai:DiVA.org:umu-130521DiVA: diva2:1067479
Presentation
2017-01-19, 09:00 (Swedish)
Supervisors
Examiners
Available from: 2017-01-30 Created: 2017-01-30Bibliographically approved

Open Access in DiVA

fulltext(859 kB)99 downloads
File information
File name FULLTEXT01.pdfFile size 859 kBChecksum SHA-512
a1400cfeb25b2fac5c1cd3e2e118f26dc69dfbb51a0eda0c7f58b85bf2a638236168ad09e6e37a903170d09217e9b028da03b1f87694b614b11cff530111aae8
Type fulltextMimetype application/pdf

By organisation
Department of Mathematics and Mathematical Statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 99 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 165 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf