CareCorpus+: expanding and augmenting caregiver strategy data to support pediatric rehabilitationShow others and affiliations
2024 (English)In: EMNLP 2024. The 2024 conference on empirical methods in natural language processing: proceedings of the conference, Association for Computational Linguistics, 2024, p. 6912-6927Conference paper, Published paper (Refereed)
Abstract [en]
Caregiver strategy classification in pediatric rehabilitation contexts is strongly motivated by real-world clinical constraints but highly underresourced and seldom studied in natural language processing settings. We introduce a large dataset of 3,062 caregiver strategies in this setting, a five-fold increase over the nearest contemporary dataset. These strategies are manually categorized into clinically established constructs with high agreement (κ=0.68-0.89). We also propose two techniques to further address identified data constraints. First, we manually supplement target task data with relevant public data from online child health forums. Next, we propose a novel data augmentation technique to generate synthetic caregiver strategies with high downstream task utility. Extensive experiments showcase the quality of our dataset. They also establish evidence that both the publicly available data and the synthetic strategies result in large performance gains, with relative F1 increases of 22.6% and 50.9%, respectively.
Place, publisher, year, edition, pages
Association for Computational Linguistics, 2024. p. 6912-6927
National Category
Computer and Information Sciences Occupational Therapy
Identifiers
URN: urn:nbn:se:umu:diva-232836DOI: 10.18653/v1/2024.emnlp-main.392Scopus ID: 2-s2.0-85217816157ISBN: 979-8-89176-164-3 (electronic)OAI: oai:DiVA.org:umu-232836DiVA, id: diva2:1920245
Conference
The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), Miami, Florida, USA, November 12-16, 2024.
Funder
NIH (National Institutes of Health), 1K12 HD0559312024-12-112024-12-112025-02-24Bibliographically approved