umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Self-supervised language grounding by active sensing combined with Internet acquired images and text
2017 (English)In: Proceedings of the Fourth International Workshop on Recognition and Action for Scene Understanding (REACTS2017), 2017, 71-83 p.Conference paper, Published paper (Refereed)
Abstract [en]

For natural and efficient verbal communication between a robot and humans, the robot should be able to learn names and appearances of new objects it encounters. In this paper we present a solution combining active sensing of images with text based and image based search on the Internet. The approach allows the robot to learn both object name and how to recognise similar objects in the future, all self-supervised without human assistance. One part of the solution is a novel iterative method to determine the object name using image classi- fication, acquisition of images from additional viewpoints, and Internet search. In this paper, the algorithmic part of the proposed solution is presented together with evaluations using manually acquired camera images, while Internet data was acquired through direct and reverse image search with Google, Bing, and Yandex. Classification with multi-classSVM and with five different features settings were evaluated. With five object classes, the best performing classifier used a combination of Pyramid of Histogram of Visual Words (PHOW) and Pyramid of Histogram of Oriented Gradient (PHOG) features, and reached a precision of 80% and a recall of 78%.

Place, publisher, year, edition, pages
2017. 71-83 p.
National Category
Computer Science Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:umu:diva-138290ISBN: 978-84-608-8176-6 (print)OAI: oai:DiVA.org:umu-138290DiVA: diva2:1133829
Conference
Fourth International Workshop on Recognition and Action for Scene Understanding (REACTS2017)
Available from: 2017-08-17 Created: 2017-08-17 Last updated: 2017-08-17

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Bensch, SunaHellström, Thomas
Computer ScienceComputer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar

Total: 56 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf