umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Natural Language Guided Object Retrieval in Images
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. (Robotics)ORCID-id: 0000-0003-0830-5303
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. (Robotics)ORCID-id: 0000-0003-3248-3839
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. (Robotics)ORCID-id: 0000-0001-7242-2200
2019 (Engelska)Ingår i: Sensors, ISSN 1424-8220, E-ISSN 1424-8220Artikel i tidskrift (Refereegranskat) Submitted
Abstract [en]

In this paper we propose a method for generation of responses to natural language queries regarding objects and their spatial relations in given images. The responses comprise identification of objects in the image, and generation of appropriate text answering the query. The proposed method uses a pre-defined neural network (YOLO) for object detection, combined with natural language processing of the given queries. Probabilistic measures are constructed for object classes, spatial relations, and word similarity such that the most likely grounding of the query can be done. By computing semantic similarity, our method overcame the problems with a limited number of object classes in pre-trained network models. At the same time, flexibility regarding the varying ways users express spatial relations was achieved. The method was implemented, and evaluated by 30 test users who considered 81.9\% of the generated answers as correct. The work may be applied in applications where visual input (images or video) and natural language input (speech or text) have to be related to each other. For example, processing of videos may benefit from functionality that relates audio to visual content. Urban Search and Rescue Robots (USAR) are used to find people in catastrophic situations such as flooding or earthquakes. It would be very beneficial if such a robot is able to respond to verbal questions from the operator about what the robot sees with its remote cameras.

Ort, förlag, år, upplaga, sidor
MDPI, 2019.
Nyckelord [en]
convolutional neural network, natural language grounding, object retrieval, spatial relations, semantic similarity
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Identifikatorer
URN: urn:nbn:se:umu:diva-165065OAI: oai:DiVA.org:umu-165065DiVA, id: diva2:1368751
Tillgänglig från: 2019-11-08 Skapad: 2019-11-08 Senast uppdaterad: 2019-11-19
Ingår i avhandling
1. Object Detection and Recognition in Unstructured Outdoor Environments
Öppna denna publikation i ny flik eller fönster >>Object Detection and Recognition in Unstructured Outdoor Environments
2019 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Computer vision and machine learning based systems are often developed to replace humans in harsh, dangerous, or tedious situations, as well as to reduce the required time to accomplish a task. Another goal is to increase performance by introducing automation to tasks such as inspections in manufacturing applications, sorting timber during harvesting, surveillance, fruit grading, yield prediction, and harvesting operations.Depending on the task, a variety of object detection and recognition algorithms can be applied, including both conventional and deep learning based approaches. Moreover, within the process of developing image analysis algorithms, it is essential to consider environmental challenges, e.g. illumination changes, occlusion, shadows, and divergence in colour, shape, texture, and size of objects.

The goal of this thesis is to address these challenges to support development of autonomous agricultural and forestry systems with enhanced performance and reduced need for human involvement.This thesis provides algorithms and techniques based on adaptive image segmentation for tree detection in forest environment and also yellow pepper recognition in greenhouses. For segmentation, seed point generation and a region growing method was used to detect trees. An algorithm based on reinforcement learning was developed to detect yellow peppers. RGB and depth data was integrated and used in classifiers to detect trees, bushes, stones, and humans in forest environments. Another part of the thesis describe deep learning based approaches to detect stumps and classify the level of rot based on images.

Another major contribution of this thesis is a method using infrared images to detect humans in forest environments. To detect humans, one shape-dependent and one shape-independent method were proposed.

Algorithms to recognize the intention of humans based on hand gestures were also developed. 3D hand gestures were recognized by first detecting and tracking hands in a sequence of depth images, and then utilizing optical flow constraint equations.

The thesis also presents methods to answer human queries about objects and their spatial relation in images. The solution was developed by merging a deep learning based method for object detection and recognition with natural language processing techniques.

Ort, förlag, år, upplaga, sidor
Umeå: Umeå University, 2019. s. 88
Serie
Report / UMINF, ISSN 0348-0542 ; 19.08
Nyckelord
Computer vision, Deep Learning, Harvesting Robots, Automatic Detection and Recognition
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Forskningsämne
datalogi
Identifikatorer
urn:nbn:se:umu:diva-165069 (URN)978-91-7855-147-7 (ISBN)
Disputation
2019-12-05, MA121, MIT Building, Umeå, 13:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2019-11-14 Skapad: 2019-11-08 Senast uppdaterad: 2019-11-12Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Personposter BETA

Ostovar, AhmadBensch, SunaHellström, Thomas

Sök vidare i DiVA

Av författaren/redaktören
Ostovar, AhmadBensch, SunaHellström, Thomas
Av organisationen
Institutionen för datavetenskap
I samma tidskrift
Sensors
Datorseende och robotik (autonoma system)

Sök vidare utanför DiVA

GoogleGoogle Scholar

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 51 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf