umu.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Alternative names
Publications (10 of 74) Show all publications
Ostovar, A., Bensch, S. & Hellström, T. (2019). Natural Language Guided Object Retrieval in Images. Sensors
Open this publication in new window or tab >>Natural Language Guided Object Retrieval in Images
2019 (English)In: Sensors, ISSN 1424-8220, E-ISSN 1424-8220Article in journal (Refereed) Submitted
Abstract [en]

In this paper we propose a method for generation of responses to natural language queries regarding objects and their spatial relations in given images. The responses comprise identification of objects in the image, and generation of appropriate text answering the query. The proposed method uses a pre-defined neural network (YOLO) for object detection, combined with natural language processing of the given queries. Probabilistic measures are constructed for object classes, spatial relations, and word similarity such that the most likely grounding of the query can be done. By computing semantic similarity, our method overcame the problems with a limited number of object classes in pre-trained network models. At the same time, flexibility regarding the varying ways users express spatial relations was achieved. The method was implemented, and evaluated by 30 test users who considered 81.9\% of the generated answers as correct. The work may be applied in applications where visual input (images or video) and natural language input (speech or text) have to be related to each other. For example, processing of videos may benefit from functionality that relates audio to visual content. Urban Search and Rescue Robots (USAR) are used to find people in catastrophic situations such as flooding or earthquakes. It would be very beneficial if such a robot is able to respond to verbal questions from the operator about what the robot sees with its remote cameras.

Place, publisher, year, edition, pages
MDPI, 2019
Keywords
convolutional neural network, natural language grounding, object retrieval, spatial relations, semantic similarity
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:umu:diva-165065 (URN)
Available from: 2019-11-08 Created: 2019-11-08 Last updated: 2019-11-11
Bensch, S. & Hellström, T. (Eds.). (2019). Proceedings of Umeå's 23rd Student Conference in Computing Science: USCCS 2019. Paper presented at Umeå's 23rd Student Conference in Computing Science (USCCS 2019). Umeå: Umeå universitet
Open this publication in new window or tab >>Proceedings of Umeå's 23rd Student Conference in Computing Science: USCCS 2019
2019 (English)Conference proceedings (editor) (Other academic)
Abstract [en]

The Umeå Student Conference in Computing Science (USCCS) is organized annually as part of a course given by the Computing Science department at Umeå University. The objective of the course is to give the students a practical introduction to independent research, scientific writing, and oral presentation.

A student who participates in the course first selects a topic and a research question that he or she is interested in. If the topic is accepted, the student outlines a paper and composes an annotated bibliography to give a survey of the research topic. The main work consists of conducting the actual research that answers the question asked, and convincingly and clearly reporting the results in a scientific paper. Another major part of the course is multiple internal peer review meetings in which groups of students read each others’ papers and give feedback to the author. This process gives valuable training in both giving and receiving criticism in a constructive manner. Altogether, the students learn to formulate and develop their own ideas in a scientific manner, in a process involving internal peer reviewing of each other’s work and under supervision of the teachers, and incremental development and refinement of a scientific paper.

Each scientific paper is submitted to USCCS through an on-line submission system, and receives reviews written by members of the Computing Science department. Based on the review, the editors of the conference proceedings (the teachers of the course) issue a decision of preliminary acceptance of the paper to each author. If, after final revision, a paper is accepted, the student is given the opportunity to present the work at the conference. The review process and the conference format aims at mimicking realistic settings for publishing and participation at scientific conferences.

Place, publisher, year, edition, pages
Umeå: Umeå universitet, 2019. p. 73
Series
Report / UMINF, ISSN 0348-0542 ; 19.02
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-155470 (URN)
Conference
Umeå's 23rd Student Conference in Computing Science (USCCS 2019)
Note

The print version of the publication differs slightly from the online version.

Available from: 2019-01-17 Created: 2019-01-17 Last updated: 2019-05-28Bibliographically approved
Persiani, M. & Hellström, T. (2019). Unsupervised Inference of Object Affordance from Text Corpora. In: Mareike Hartmann, Barbara Plank (Ed.), Proceedings of the 22nd Nordic Conference on Computational Linguistics: . Paper presented at 22nd Nordic Conference on Computational Linguistics (NoDaLiDa’19), September 30 – October 2, 2019, Turku, Finland. Association for Computational Linguistics, Article ID W19-6112.
Open this publication in new window or tab >>Unsupervised Inference of Object Affordance from Text Corpora
2019 (English)In: Proceedings of the 22nd Nordic Conference on Computational Linguistics / [ed] Mareike Hartmann, Barbara Plank, Association for Computational Linguistics, 2019, article id W19-6112Conference paper, Published paper (Refereed)
Abstract [en]

Affordances denote actions that can be performed in the presence of different objects, or possibility of action in an environment. In robotic systems, affordances and actions may suffer from poor semantic generalization capabilities due to the high amount of required hand-crafted specifications. To alleviate this issue, we propose a method to mine for object-action pairs in free text corpora, successively training and evaluating different prediction models of affordance based on word embeddings.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2019
Keywords
Affordance, Natural Language Processing, Robotics, Intention Recognition, Conditional Variational Autoencoder
National Category
Robotics
Identifiers
urn:nbn:se:umu:diva-163356 (URN)
Conference
22nd Nordic Conference on Computational Linguistics (NoDaLiDa’19), September 30 – October 2, 2019, Turku, Finland
Available from: 2019-09-16 Created: 2019-09-16 Last updated: 2019-10-23Bibliographically approved
Ostovar, A., Ringdahl, O. & Hellström, T. (2018). Adaptive Image Thresholding of Yellow Peppers for a Harvesting Robot. Robotics, 7(1), Article ID 11.
Open this publication in new window or tab >>Adaptive Image Thresholding of Yellow Peppers for a Harvesting Robot
2018 (English)In: Robotics, E-ISSN 2218-6581, Vol. 7, no 1, article id 11Article in journal (Refereed) Published
Abstract [en]

The presented work is part of the H2020 project SWEEPER with the overall goal to develop a sweet pepper harvesting robot for use in greenhouses. As part of the solution, visual servoing is used to direct the manipulator towards the fruit. This requires accurate and stable fruit detection based on video images. To segment an image into background and foreground, thresholding techniques are commonly used. The varying illumination conditions in the unstructured greenhouse environment often cause shadows and overexposure. Furthermore, the color of the fruits to be harvested varies over the season. All this makes it sub-optimal to use fixed pre-selected thresholds. In this paper we suggest an adaptive image-dependent thresholding method. A variant of reinforcement learning (RL) is used with a reward function that computes the similarity between the segmented image and the labeled image to give feedback for action selection. The RL-based approach requires less computational resources than exhaustive search, which is used as a benchmark, and results in higher performance compared to a Lipschitzian based optimization approach. The proposed method also requires fewer labeled images compared to other methods. Several exploration-exploitation strategies are compared, and the results indicate that the Decaying Epsilon-Greedy algorithm gives highest performance for this task. The highest performance with the Epsilon-Greedy algorithm ( ϵ = 0.7) reached 87% of the performance achieved by exhaustive search, with 50% fewer iterations than the benchmark. The performance increased to 91.5% using Decaying Epsilon-Greedy algorithm, with 73% less number of iterations than the benchmark.

Place, publisher, year, edition, pages
MDPI, 2018
Keywords
reinforcement learning, Q-Learning, image thresholding, ϵ-greedy strategies
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Computerized Image Analysis
Identifiers
urn:nbn:se:umu:diva-144513 (URN)10.3390/robotics7010011 (DOI)000432680200008 ()
Funder
EU, Horizon 2020, 644313
Available from: 2018-02-05 Created: 2018-02-05 Last updated: 2019-11-11Bibliographically approved
Hellström, T. & Bensch, S. (2018). Modeling Interaction for Understanding in HRI. In: Proceedings of Explainable Robotic Systems Workshop at HRI 2018, Chicago, USA, March 2018: . Paper presented at HRI 2018, Chicago, USA, March 2018.
Open this publication in new window or tab >>Modeling Interaction for Understanding in HRI
2018 (English)In: Proceedings of Explainable Robotic Systems Workshop at HRI 2018, Chicago, USA, March 2018, 2018Conference paper, Published paper (Refereed)
Abstract [en]

As robots become more and more capable and autonomous, there is an increased need for humans to understand what the robots do and think. In this paper we investigate what such understanding means and includes, and how robots are and can be designed to support understanding. We present a model of interaction for understanding. The aim is to provide a uniform formal understanding of the large body of existing work, and also to support continued work in the area.

National Category
Robotics
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-149102 (URN)10.475/123_4 (DOI)
Conference
HRI 2018, Chicago, USA, March 2018
Available from: 2018-06-15 Created: 2018-06-15 Last updated: 2018-06-18Bibliographically approved
Bensch, S. & Hellström, T. (Eds.). (2018). Proceedings of Umeå's 22nd Student Conference in Computing Science (USCCS 2018). Paper presented at Umeå's 22nd Student Conference in Computing Science – USCCS 2018. Umeå: Department of Computing Science, Umeå University
Open this publication in new window or tab >>Proceedings of Umeå's 22nd Student Conference in Computing Science (USCCS 2018)
2018 (English)Conference proceedings (editor) (Other academic)
Place, publisher, year, edition, pages
Umeå: Department of Computing Science, Umeå University, 2018. p. 87
Series
Report / UMINF, ISSN 0348-0542 ; 18.1
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:umu:diva-144305 (URN)
Conference
Umeå's 22nd Student Conference in Computing Science – USCCS 2018
Available from: 2018-01-30 Created: 2018-01-30 Last updated: 2018-06-11Bibliographically approved
Hellström, T. & Bensch, S. (2018). Understandable Robots: What, Why, and How. Paladyn - Journal of Behavioral Robotics, 9(1), 110-123
Open this publication in new window or tab >>Understandable Robots: What, Why, and How
2018 (English)In: Paladyn - Journal of Behavioral Robotics, ISSN 2080-9778, E-ISSN 2081-4836, Vol. 9, no 1, p. 110-123Article in journal (Refereed) Published
Abstract [en]

As robots become more and more capable and autonomous, there is an increasing need for humans to understand what the robots do and think. In this paper, we investigate what such understanding means and in- cludes, and how robots can be designed to support un- derstanding. After an in-depth survey of related earlier work, we discuss examples showing that understanding includes not only the intentions of the robot, but also de- sires, knowledge, beliefs, emotions, perceptions, capabil- ities, and limitations of the robot. The term understandingis formally defined, and the term communicative actions is defined to denote the various ways in which a robot may support a human’s understanding of the robot. A novel model of interaction for understanding is presented. The model describes how both human and robot may utilize a first or higher-order theory of mind to understand each other and perform communicative actions in order to sup- port the other’s understanding. It also describes simpler cases in which the robot performs static communicative actions in order to support the human’s understanding of the robot. In general, communicative actions performed by the robot aim at reducing the mismatch between the mind of the robot, and the robot’s inferred model of the human’s model of the mind of the robot. Based on the pro- posed model, a set of questions are formulated, to serve as support when developing and implementing the model in real interacting robots.

Place, publisher, year, edition, pages
Warsaw, Poland: De Gruyter Open, 2018
Keywords
human-robot interaction, communication, predictable, explainable
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-150156 (URN)10.1515/pjbr-2018-0009 (DOI)2-s2.0-85050650946 (Scopus ID)
Available from: 2018-07-12 Created: 2018-07-12 Last updated: 2018-10-26Bibliographically approved
Bensch, S., Jevtic, A. & Hellström, T. (2017). On Interaction Quality in Human-Robot Interaction. In: H. Jaap van den Herik, Ana Paula Rocha, Joaquim Filipe (Ed.), Proceedings of the 9th International Conference on Agents and Artificial Intelligence: . Paper presented at 9th International Conference on Agents and Artificial Intelligence (ICAART), 24-26 February, 2017, Porto, Portugal (pp. 182-189). Setúbal: SciTePress, 1
Open this publication in new window or tab >>On Interaction Quality in Human-Robot Interaction
2017 (English)In: Proceedings of the 9th International Conference on Agents and Artificial Intelligence / [ed] H. Jaap van den Herik, Ana Paula Rocha, Joaquim Filipe, Setúbal: SciTePress, 2017, Vol. 1, p. 182-189Conference paper, Published paper (Refereed)
Abstract [en]

In many complex robotics systems, interaction takes place in all directions between human, robot, and environment. Performance of such a system depends on this interaction, and a proper evaluation of a system must build on a proper modeling of interaction, a relevant set of performance metrics, and a methodology to combine metrics into a single performance value. In this paper, existing models of human-robot interaction are adapted to fit complex scenarios with one or several humans and robots. The interaction and the evaluation process is formalized, and a general method to fuse performance values over time and for several performance metrics is presented. The resulting value, denoted interaction quality, adds a dimension to ordinary performance metrics by being explicit about the interplay between performance metrics, and thereby provides a formal framework to understand, model, and address complex aspects of evaluation of human-robot interaction. 

Place, publisher, year, edition, pages
Setúbal: SciTePress, 2017
Keywords
Human-Robot Interaction, Evaluation, Performance
National Category
Robotics
Identifiers
urn:nbn:se:umu:diva-137250 (URN)10.5220/0006191601820189 (DOI)000413243500019 ()978-989-758-219-6 (ISBN)
Conference
9th International Conference on Agents and Artificial Intelligence (ICAART), 24-26 February, 2017, Porto, Portugal
Available from: 2017-06-28 Created: 2017-06-28 Last updated: 2018-06-09Bibliographically approved
Bensch, S. & Hellström, T. (Eds.). (2017). Proceedings of Umeå's 21st student conference in computing science: USCCS 2017. Paper presented at Umeå's 21st student conference in computing science, USCCS, Umeå, January 13, 2017. Umeå: Umeå University
Open this publication in new window or tab >>Proceedings of Umeå's 21st student conference in computing science: USCCS 2017
2017 (English)Conference proceedings (editor) (Other academic)
Place, publisher, year, edition, pages
Umeå: Umeå University, 2017. p. 183
Series
Report / UMINF, ISSN 0348-0542 ; 17.1
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-130499 (URN)
Conference
Umeå's 21st student conference in computing science, USCCS, Umeå, January 13, 2017
Available from: 2017-01-20 Created: 2017-01-20 Last updated: 2018-06-09Bibliographically approved
Abedin, M. R., Bensch, S. & Hellström, T. (2017). Self-supervised language grounding by active sensing combined with Internet acquired images and text. In: Jorge Dias George Azzopardi, Rebeca Marf (Ed.), Proceedings of the Fourth International Workshop on Recognition and Action for Scene Understanding (REACTS2017): . Paper presented at Fourth International Workshop on Recognition and Action for Scene Understanding (REACTS2017), August 25, 2017, Ystad, Sweden (pp. 71-83). Málaga: REACTS
Open this publication in new window or tab >>Self-supervised language grounding by active sensing combined with Internet acquired images and text
2017 (English)In: Proceedings of the Fourth International Workshop on Recognition and Action for Scene Understanding (REACTS2017) / [ed] Jorge Dias George Azzopardi, Rebeca Marf, Málaga: REACTS , 2017, p. 71-83Conference paper, Published paper (Refereed)
Abstract [en]

For natural and efficient verbal communication between a robot and humans, the robot should be able to learn names and appearances of new objects it encounters. In this paper we present a solution combining active sensing of images with text based and image based search on the Internet. The approach allows the robot to learn both object name and how to recognise similar objects in the future, all self-supervised without human assistance. One part of the solution is a novel iterative method to determine the object name using image classi- fication, acquisition of images from additional viewpoints, and Internet search. In this paper, the algorithmic part of the proposed solution is presented together with evaluations using manually acquired camera images, while Internet data was acquired through direct and reverse image search with Google, Bing, and Yandex. Classification with multi-classSVM and with five different features settings were evaluated. With five object classes, the best performing classifier used a combination of Pyramid of Histogram of Visual Words (PHOW) and Pyramid of Histogram of Oriented Gradient (PHOG) features, and reached a precision of 80% and a recall of 78%.

Place, publisher, year, edition, pages
Málaga: REACTS, 2017
National Category
Computer Sciences Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:umu:diva-138290 (URN)978-84-608-8176-6 (ISBN)
Conference
Fourth International Workshop on Recognition and Action for Scene Understanding (REACTS2017), August 25, 2017, Ystad, Sweden
Available from: 2017-08-17 Created: 2017-08-17 Last updated: 2018-06-09Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-7242-2200

Search in DiVA

Show all publications