Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Natural Language Guided Object Retrieval in Images
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Robotics)ORCID iD: 0000-0003-0830-5303
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Robotics)ORCID iD: 0000-0003-3248-3839
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Robotics)ORCID iD: 0000-0001-7242-2200
2021 (English)In: Acta Informatica, ISSN 0001-5903, E-ISSN 1432-0525, Vol. 58, p. 243-261Article in journal (Refereed) Published
Abstract [en]

The ability to understand the surrounding environment and being able to communicate with interacting humans are important functionalities for many automated systems where visual input (e.g., images, video) and natural language input (speech or text) have to be related to each other. Possible applications are automatic image caption generation, interactive surveillance systems, or human robot interaction. In this paper, we propose algorithms for automatic responses to natural language queries about an image. Our approach uses a predefined neural net for detection of bounding boxes and objects in images, spatial relations between bounding boxes are modeled with a neural net, the queries are analyzed with a syntactic parser, and algorithms to map natural language to properties in the images are introduced. The algorithms make use of semantic similarity and antonyms. We evaluate the performance of our approach with test users assessing the quality of our system’s generated answers.

Place, publisher, year, edition, pages
Springer, 2021. Vol. 58, p. 243-261
Keywords [en]
convolutional neural network, natural language grounding, object retrieval, spatial relations, semantic similarity
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:umu:diva-165065DOI: 10.1007/s00236-021-00400-2ISI: 000674657100002Scopus ID: 2-s2.0-85110811104OAI: oai:DiVA.org:umu-165065DiVA, id: diva2:1368751
Note

Previously included in thesis in manuscript form.

Available from: 2019-11-08 Created: 2019-11-08 Last updated: 2023-09-05Bibliographically approved
In thesis
1. Object Detection and Recognition in Unstructured Outdoor Environments
Open this publication in new window or tab >>Object Detection and Recognition in Unstructured Outdoor Environments
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Computer vision and machine learning based systems are often developed to replace humans in harsh, dangerous, or tedious situations, as well as to reduce the required time to accomplish a task. Another goal is to increase performance by introducing automation to tasks such as inspections in manufacturing applications, sorting timber during harvesting, surveillance, fruit grading, yield prediction, and harvesting operations.Depending on the task, a variety of object detection and recognition algorithms can be applied, including both conventional and deep learning based approaches. Moreover, within the process of developing image analysis algorithms, it is essential to consider environmental challenges, e.g. illumination changes, occlusion, shadows, and divergence in colour, shape, texture, and size of objects.

The goal of this thesis is to address these challenges to support development of autonomous agricultural and forestry systems with enhanced performance and reduced need for human involvement.This thesis provides algorithms and techniques based on adaptive image segmentation for tree detection in forest environment and also yellow pepper recognition in greenhouses. For segmentation, seed point generation and a region growing method was used to detect trees. An algorithm based on reinforcement learning was developed to detect yellow peppers. RGB and depth data was integrated and used in classifiers to detect trees, bushes, stones, and humans in forest environments. Another part of the thesis describe deep learning based approaches to detect stumps and classify the level of rot based on images.

Another major contribution of this thesis is a method using infrared images to detect humans in forest environments. To detect humans, one shape-dependent and one shape-independent method were proposed.

Algorithms to recognize the intention of humans based on hand gestures were also developed. 3D hand gestures were recognized by first detecting and tracking hands in a sequence of depth images, and then utilizing optical flow constraint equations.

The thesis also presents methods to answer human queries about objects and their spatial relation in images. The solution was developed by merging a deep learning based method for object detection and recognition with natural language processing techniques.

Place, publisher, year, edition, pages
Umeå: Umeå University, 2019. p. 88
Series
Report / UMINF, ISSN 0348-0542 ; 19.08
Keywords
Computer vision, Deep Learning, Harvesting Robots, Automatic Detection and Recognition
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-165069 (URN)978-91-7855-147-7 (ISBN)
Public defence
2019-12-05, MA121, MIT Building, Umeå, 13:00 (English)
Opponent
Supervisors
Available from: 2019-11-14 Created: 2019-11-08 Last updated: 2019-11-12Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Ostovar, AhmadBensch, SunaHellström, Thomas

Search in DiVA

By author/editor
Ostovar, AhmadBensch, SunaHellström, Thomas
By organisation
Department of Computing Science
In the same journal
Acta Informatica
Computer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 1763 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf