Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
An empirical study on the joint impact of feature selection and data resampling on imbalance classification
Henan Key Lab of Big Data Analysis and Processing, Henan University, Henan, China.
Umeå University, Faculty of Medicine, Department of Radiation Sciences, Radiation Physics. Department of Engineering, University Campus Bio-Medico of Rome, Rome, Italy.ORCID iD: 0000-0003-2621-072X
Henan Key Lab of Big Data Analysis and Processing, Henan University, Henan, China.
Henan Key Lab of Big Data Analysis and Processing, Henan University, Henan, China.
Show others and affiliations
2023 (English)In: Applied intelligence (Boston), ISSN 0924-669X, E-ISSN 1573-7497, Vol. 53, p. 5449-5461Article in journal (Refereed) Published
Abstract [en]

Many real-world datasets exhibit imbalanced distributions, in which the majority classes have sufficient samples, whereas the minority classes often have a very small number of samples. Data resampling has proven to be effective in alleviating such imbalanced settings, while feature selection is a commonly used technique for improving classification performance. However, the joint impact of feature selection and data resampling on two-class imbalance classification has rarely been addressed before. This work investigates the performance of two opposite imbalanced classification frameworks in which feature selection is applied before or after data resampling. We conduct a large-scale empirical study with a total of 9225 experiments on 52 publicly available datasets. The results show that both frameworks should be considered for finding the best performing imbalanced classification model. We also study the impact of classifiers, the ratio between the number of majority and minority samples (IR), and the ratio between the number of samples and features (SFR) on the performance of imbalance classification. Overall, this work provides a new reference value for researchers and practitioners in imbalance learning.

Place, publisher, year, edition, pages
Springer Nature, 2023. Vol. 53, p. 5449-5461
Keywords [en]
Data selection, Feature selection, Imbalanced classification, Resampling
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:umu:diva-203069DOI: 10.1007/s10489-022-03772-1ISI: 000814984100002Scopus ID: 2-s2.0-85132571104OAI: oai:DiVA.org:umu-203069DiVA, id: diva2:1727735
Note

Correction: Zhang, C., Soda, P., Bi, J. et al. Correction to: An empirical study on the joint impact of feature selection and data resampling on imbalance classification. Appl Intell 53, 8506 (2023). DOI: 10.1007/s10489-022-03953-y

Available from: 2023-01-17 Created: 2023-01-17 Last updated: 2023-05-04Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Soda, Paolo

Search in DiVA

By author/editor
Soda, Paolo
By organisation
Radiation Physics
In the same journal
Applied intelligence (Boston)
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 156 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf