Umeå universitets logga

umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Fusion in context: a multimodal approach to affective state recognition
KTH: The Royal Institute of Technology, Stockholm, Sweden.
Pal Robotics, Barcelona, Spain.
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.ORCID-id: 0000-0003-2282-9939
KTH: The Royal Institute of Technology, Stockholm, Sweden.
Visa övriga samt affilieringar
2025 (Engelska)Ingår i: 2025 34th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), IEEE, 2025, s. 1049-1055Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Accurate recognition of human emotions is a crucial challenge in affective computing and human-robot interaction (HRI). Emotional states play a vital role in shaping behaviors, decisions, and social interactions. However, emotional expressions can be influenced by contextual factors, leading to misinterpretations if context is not considered. Multimodal fusion, combining modalities like facial expressions, speech, and physiological signals, has shown promise in improving affect recognition. This paper proposes a transformer-based multimodal fusion approach that leverages facial thermal data, facial action units, and textual context information for context-aware emotion recognition. We explore modality-specific encoders to learn tailored representations, which are then fused and processed by a shared transformer encoder to capture temporal dependencies and interactions. The proposed method is evaluated on a dataset collected from participants engaged in a tangible tabletop Pacman game designed to induce various affective states. Our results demonstrate improvements from incorporating contextual information and multimodal fusion, achieving 89% F1 score with our full model compared to 65% for action units alone and 30% for thermal data alone.

Ort, förlag, år, upplaga, sidor
IEEE, 2025. s. 1049-1055
Serie
IEEE RO-MAN, ISSN 1944-9445, E-ISSN 1944-9437
Nyckelord [en]
computer vision, Human detection, social human-robot interaction
Nationell ämneskategori
Datorgrafik och datorseende
Identifikatorer
URN: urn:nbn:se:umu:diva-247946DOI: 10.1109/RO-MAN63969.2025.11217904Scopus ID: 2-s2.0-105024539281ISBN: 9798331587710 (digital)OAI: oai:DiVA.org:umu-247946DiVA, id: diva2:2025516
Konferens
34th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2025, Eindhoven, Netherlands, August 25-29, 2025
Tillgänglig från: 2026-01-07 Skapad: 2026-01-07 Senast uppdaterad: 2026-01-07Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Person

Güneysu Özgür, Arzu

Sök vidare i DiVA

Av författaren/redaktören
Güneysu Özgür, Arzu
Av organisationen
Institutionen för datavetenskap
Datorgrafik och datorseende

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 28 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf