Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Named entity recognition in Italian lung cancer clinical reports using transformers
Università Campus Bio-Medico di Roma, Unit of Computer Systems & Bioinformatics, Department of Engineering, Italy.
University of Cassino and Southern Latium, Department of Electrical and Information Engineering, Cassino, Italy.
Università Campus Bio-Medico di Roma, Research Unit of Radiation Oncology, Department of Medicine and Surgery, Italy; Fondazione Policlinico, Universitario Campus Bio-Medico, Operative Research Unit of Radiation Oncology, Italy.
Fondazione Policlinico, Universitario Campus Bio-Medico, Operative Research Unit of Medical Oncology, Italy.
Show others and affiliations
2023 (English)In: Proceedings - 2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023, Institute of Electrical and Electronics Engineers (IEEE), 2023, p. 4101-4107Conference paper, Published paper (Refereed)
Abstract [en]

The widespread adoption of electronic health records (EHRs) offers a valuable opportunity to support clinical research by containing crucial patient information, including diagnoses, symptoms, medications, lab tests, and more. Despite the success of deep learning for biomedical Named Entity Recognition (NER), the literature in this field still presents a gap regarding applications focused on lung cancer for the Italian language. Hence, this paper presents a transformer-based approach to extract named entities from Italian clinical notes related to Non-Small Cell Lung Cancer (NSCLC). We introduce a novel set of 25 clinical entities related to NSCLC building a corpus annotated for NER. We apply a state-of the-art model pre-trained on Italian biomedical texts to the manually annotated clinical reports of a cohort of 257 patients suffering from NSCLC, successfully dealing with class-imbalance problems and obtaining promising performance (average F1-score of 84.3%). We also compared our method with two other pre-trained state-of-the-art models showing that the domain specific knowledge offered by the proposed approach is necessary to achieve higher performance. These findings also showcase the feasibility of using transformers to extract biomedical information in the Italian language.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023. p. 4101-4107
Keywords [en]
deep learning, EHRs, NER, NSCLC, trasformer
National Category
Information Systems
Identifiers
URN: urn:nbn:se:umu:diva-221398DOI: 10.1109/BIBM58861.2023.10385778Scopus ID: 2-s2.0-85184904088ISBN: 9798350337488 (electronic)OAI: oai:DiVA.org:umu-221398DiVA, id: diva2:1840884
Conference
2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023, Istanbul, 5-8 december, 2023.
Available from: 2024-02-27 Created: 2024-02-27 Last updated: 2024-02-27Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Soda, Paolo

Search in DiVA

By author/editor
Soda, Paolo
By organisation
Radiation Physics
Information Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 93 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf