Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Utvärdering av rule of ten för logistisk regression
Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
2023 (Swedish)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
Evaluating Rule of Ten for Logistic Regression (English)
Abstract [sv]

Tidigare studier har visat att koefficientskattningar för logistisk regression inte är pålitliga när EPV (events per variable, händelser per variabel) är lågt. Baserat på dessa studier har en tumregel på minst 10 EPV föreslagits. Tumregeln kallas ’rule of ten’ och är vad som har undersökts i den här studien. För att utvärdera tumregeln gjordes en simuleringsstudie, i programmeringsspråket R, där nya datamaterial genererades baserat på ett verkligt datamaterial. 500 datamaterial genererades för varje stickprovsstorlek och EPV. För varje datamaterial skattades nya modeller och modellernas koefficienter användes för utvärderingen. Totalt analyserades 10 olika EPV och 18 stickprovsstorlekar. Resultaten bekräftar tidigare studier som visat att flera problem kan uppstå vid låga EPV och att stickprovsstorleken har mindre påverkan på resultaten. Problemen är också starkt relaterade till sambandet mellan förklarings- och responsvariabeln. 

Abstract [en]

Previous studies have shown that coefficient estimates for logistic regression are not reliable when EPV (events per variable, events per variable) is low. Based on these studies, a rule of thumb of at least 10 EPV has been proposed. The rule of thumb is called the 'rule of ten' and is what is being investigated in this study. To evaluate the rule of thumb, a simulation study was performed in the programming language R, where new datasets were generated based on an original dataset. 500 datasets were generated for each sample size and EPV. For each dataset, new models were evaluated which are evaluated by comparisons with the original model. A total of 10 different EPVs and 18 sample sizes were analyzed. The results confirm previous studies that have shown that several problems can occur at low EPV and that the sample size has a lesser effect on the results. The problems are also strongly related to the relationship between the independent and the dependent variable. 

Place, publisher, year, edition, pages
2023.
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:umu:diva-210054OAI: oai:DiVA.org:umu-210054DiVA, id: diva2:1769804
Available from: 2023-06-19 Created: 2023-06-18 Last updated: 2023-06-19Bibliographically approved

Open Access in DiVA

fulltext(3104 kB)437 downloads
File information
File name FULLTEXT01.pdfFile size 3104 kBChecksum SHA-512
79d96af89588ae62c2e64941de7ad0f46bc619091d7eacbcfcdd45f786fc2360cee51037b0dc4c6ccc61c5844b1dec343e3dbc22b0a29410013213261bafebc1
Type fulltextMimetype application/pdf

By organisation
Statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 437 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 275 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf