Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Probing Multimodal Embeddings for Linguistic Properties: the Visual-Semantic Case
Umeå University, Faculty of Science and Technology, Department of Computing Science.ORCID iD: 0000-0002-1112-2981
Umeå University, Faculty of Science and Technology, Department of Computing Science.ORCID iD: 0000-0001-8503-0118
Umeå University, Faculty of Science and Technology, Department of Computing Science.
Umeå University, Faculty of Science and Technology, Department of Computing Science.ORCID iD: 0000-0001-7349-7693
2020 (English)In: Proceedings of the 28th International Conference on Computational Linguistics (COLING), 2020, p. 730-744Conference paper, Published paper (Refereed)
Abstract [en]

Semantic embeddings have advanced the state of the art for countless natural language processing tasks, and various extensions to multimodal domains, such as visual-semantic embeddings, have been proposed. While the power of visual-semantic embeddings comes from the distillation and enrichment of information through machine learning, their inner workings are poorly understood and there is a shortage of analysis tools. To address this problem, we generalize the notion of probing tasks to the visual-semantic case. To this end, we (i) discuss the formalization of probing tasks for embeddings of image-caption pairs, (ii) define three concrete probing tasks within our general framework, (iii) train classifiers to probe for those properties, and (iv) compare various state-of-the-art embeddings under the lens of the proposed probing tasks. Our experiments reveal an up to 12% increase in accuracy on visual-semantic embeddings compared to the corresponding unimodal embeddings, which suggest that the text and image dimensions represented in the former do complement each other

Place, publisher, year, edition, pages
2020. p. 730-744
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:umu:diva-179880DOI: 10.18653/v1/2020.coling-main.64OAI: oai:DiVA.org:umu-179880DiVA, id: diva2:1527790
Conference
COling´20220 : 8th International Conference on Computational Linguistics, Barcelona, Spain (Online), December 8-13, 2020
Available from: 2021-02-11 Created: 2021-02-11 Last updated: 2021-04-20Bibliographically approved

Open Access in DiVA

fulltext(1824 kB)120 downloads
File information
File name FULLTEXT01.pdfFile size 1824 kBChecksum SHA-512
b9aec800e35d07dc48e4d20931dbecdb1764ecb2d2830d4efe67ac9861915ce3a4a0177df67004792c6e63f83158945fb9777f9ae16cbab97e217c2db22844b9
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records

Dahlgren Lindström, AdamBjörklund, JohannaBensch, SunaDrewes, Frank

Search in DiVA

By author/editor
Dahlgren Lindström, AdamBjörklund, JohannaBensch, SunaDrewes, Frank
By organisation
Department of Computing Science
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 120 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 293 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf